Appendix: Complete Annotated Workflow and Implementation Details

A Hands-On Journey Through Pharmacometric Analysis with Pumas

Show/Hide Code
using Pumas
using PumasUtilities
using DataFramesMeta
using CSV
using SummaryTables
using CairoMakie
using AlgebraOfGraphics
using Random
Show/Hide Code
rng = Random.seed!(5438)

1 Introduction: Your Guide to Integrated Pharmacometric Analysis

Welcome to this comprehensive, hands-on tutorial that demonstrates the complete Pumas workflow for continuous data modeling. This appendix accompanies the main article by providing details needed to reproduce the analysis from scratch—every line of code, every modeling decision, and every diagnostic check.

1.1 What You’ll Learn

This tutorial walks through the PKPD workflow, from raw clinical trial data through to simulation-based dose recommendations. This document demonstrates how Pumas integrates each stage of the workflow:

Data PreparationExploratory AnalysisNon-Compartmental AnalysisPopulation PK ModelingPKPD IntegrationSimulation & Decision Support

Along the way, you’ll discover not just what to do, but why each step matters for robust drug development decisions.

📖 About This Dataset

Throughout this tutorial, we analyze data from a synthetic multiple ascending dose (MAD) study designed to mirror real-world Phase 1 clinical trials:

  • 60 subjects across 6 dose levels (placebo + 100, 200, 400, 800, 1600 mg)
  • Pharmacokinetics: Rich sampling on Days 1 and 6, sparse trough sampling between
  • Pharmacodynamics: Biomarker measurements focused at steady state
  • Covariates: Sex and baseline body weight

The data was generated using known ground truth parameters (see 1_data_simulation.qmd), allowing us to validate our modeling approach. However, we’ll analyze it exactly as if it came from a real clinical trial—you won’t need to know the true parameters to follow along!

1.2 How to Use This Appendix

This document is designed to be:

  • Executable: Every code block runs in sequence—copy and paste to reproduce
  • Educational: Extensive explanations reveal the reasoning behind each choice
  • Practical: Tips, warnings, and best practices guide you toward real-world applications
  • Modular: Each section builds on previous work but can also serve as a standalone reference
💡 Reading Strategies

First-time learners: Read sequentially, executing each code block to build intuition

Experienced pharmacometricians: Jump to specific sections using the table of contents—each analysis stage is self-contained

Reference seekers: Use section cross-references to find detailed implementations

1.3 The Workflow Stages

Our analysis follows the pharmacometric workflow presented in the main article (Figure 1). The appendix sections map to these stages:

Stage Description Appendix Section
1 Data Preparation & Quality Checks A.2 (#sec-data-prep)
2 Exploratory Data Analysis A.2 (#sec-data-prep)
3 Non-Compartmental Analysis A.3 (#sec-nca)
4-8 Population PK Model Development A.4 (#sec-pop-pk)
9-10 Parameter Inference & Uncertainty A.5 (#sec-uncertainty)
11 Pharmacodynamic Modeling A.6 (#sec-pkpd)
12 Simulation for Decision Support A.7 (#sec-simulation)

Let’s begin our journey with the most fundamental step: preparing our data for analysis.


1.4 A.2 Data Preparation and Quality Assessment

In any pharmacometric analysis, data preparation is where rigor begins. Before we fit a single model, we must ensure our data is clean, well-structured, and ready for analysis. This section demonstrates how Pumas transforms raw clinical trial data into analysis-ready population objects while maintaining data integrity at every step.

1.4.1 The Starting Point: Loading Raw Data

Our journey begins with a CSV file exported from the clinical database—the standard format for pharmacometric analyses. Let’s load it and take our first look:

Show/Hide Code
path = joinpath(@__DIR__, "data", "intro_paper_data.csv")
wide_data = CSV.read(path, DataFrame)
first(wide_data, 5)
5×14 DataFrame
Row ID NOMTIME PROFTIME PROFDAY AMT EVID CMT ROUTE CONC PD_CONC SEX WEIGHTB DOSE TRTACT
String7 Float64 Float64 Float64 Float64 Int64 String7? String3 Float64? Float64? String7 Float64 Int64 String7
1 0_1 -0.1 0.0 0.0 0.0 0 missing ev 0.0 49.19 Male 83.31 0 Placebo
2 0_1 0.0 0.0 1.0 0.0 1 Depot ev missing missing Male 83.31 0 Placebo
3 0_1 0.1 0.1 1.0 0.0 0 missing ev 0.0 missing Male 83.31 0 Placebo
4 0_1 0.5 0.5 1.0 0.0 0 missing ev 0.0 missing Male 83.31 0 Placebo
5 0_1 1.0 1.0 1.0 0.0 0 missing ev 0.0 missing Male 83.31 0 Placebo
📊 Understanding the Data Structure

This dataset follows the “wide format” convention where each observation type has its own column (CONC for PK, PD_CONC for PD). This differs from “long format” where all observations share a single column with a type identifier.

Why wide format matters: Pumas requires wide format. To read more on the data format requirements, see the Data Representation Tutorial.

At this stage, we’re looking for red flags:

  • Missing required columns (ID, TIME, AMT, EVID)?
  • Unexpected data types (text where we expect numbers)?
  • Implausible values (negative concentrations, impossible times)?

The first() function gives us a quick preview. Everything looks reasonable—let’s proceed to create our analysis populations.

1.4.2 Creating Analysis-Specific Populations

Here’s a key Pumas philosophy: One dataset, multiple populations. Different analyses require different data subsets and structures. We’ll create three populations, each optimized for its intended use:

  1. NCA population: Days 1 & 6 only, active doses, rich PK sampling
  2. NLME population (active): All time points, active doses, both PK and PD
  3. NLME population (with placebo): Same as #2 but includes placebo for covariate checks

1.4.2.1 Prepare NCA Population

Non-compartmental analysis requires complete concentration-time profiles without gaps. We filter strategically:

Show/Hide Code
df_nca = deepcopy(wide_data)

# Filter to relevant rows: non-negative nominal time, active doses only, PK data only
@rsubset! df_nca begin
    :NOMTIME >= 0
    :DOSE > 0
    :PROFDAY  [1, 6]  # First dose and steady state
end

# Transform nominal time from hours to days and add route information
@rtransform! df_nca begin
    :NOMTIME = :NOMTIME / 24  # NCA functions expect days
    :ROUTE = "ev"  # extravascular (oral) administration
end

# Define units for NCA analysis
timeu = u"d"      # days
concu = u"ng/mL"  # nanograms per milliliter
amtu = u"mg"      # milligrams

1.4.2.2 Create NCA Population Object

Now we create the NCA population object:

Show/Hide Code
population_nca = read_nca(
    df_nca;
    id=:ID,
    time=:PROFTIME,  # Time since last dose
    observations=:CONC,
    amt=:AMT,
    route=:ROUTE,
    group=[:PROFDAY, :DOSE]  # Stratify by day and dose for comparisons
)
NCAPopulation (50 subjects):
  Group: [["PROFDAY" => 1.0, "DOSE" => 100], ["PROFDAY" => 1.0, "DOSE" => 200], ["PROFDAY" => 1.0, "DOSE" => 400], ["PROFDAY" => 1.0, "DOSE" => 800], ["PROFDAY" => 1.0, "DOSE" => 1600], ["PROFDAY" => 6.0, "DOSE" => 100], ["PROFDAY" => 6.0, "DOSE" => 200], ["PROFDAY" => 6.0, "DOSE" => 400], ["PROFDAY" => 6.0, "DOSE" => 800], ["PROFDAY" => 6.0, "DOSE" => 1600]]
  Number of missing observations: 100
  Number of blq observations: 0

The group argument tells Pumas we want to compare NCA metrics across dose levels and between Day 1 vs. Day 6.

1.4.2.3 Prepare NLME Populations

For nonlinear mixed-effects modeling, we need the full time course including dosing events:

Show/Hide Code
# population with placebo data (for covariate balance checks)
population_placebo = read_pumas(
    wide_data;
    id=:ID,
    time=:NOMTIME,
    observations=[:CONC, :PD_CONC],  # Multi-endpoint: both PK and PD
    amt=:AMT,
    cmt=:CMT,  # Compartment for dosing
    evid=:EVID,  # Event ID: 1=dose, 0=observation
    covariates=[:WEIGHTB, :SEX, :TRTACT, :DOSE]
)

# population without placebo data (for PKPD model fitting)
population = read_pumas(
    filter(r -> r.DOSE != 0, wide_data);  # Exclude placebo
    id=:ID,
    time=:NOMTIME,
    observations=[:CONC, :PD_CONC],
    amt=:AMT,
    cmt=:CMT,
    evid=:EVID,
    covariates=[:WEIGHTB, :SEX, :TRTACT, :DOSE]
)
Warning: Your dataset has bolus doses with zero dose amount.
@ Pumas none:2327
Population
  Subjects: 50
  Covariates: WEIGHTB, SEX, TRTACT, DOSE
  Observations: CONC, PD_CONC
💡 Why Two NLME Populations?

With placebo (population_placebo): Use this to verify demographic balance across treatment groups. Imbalances in sex or weight distributions could confound covariate analyses.

Without placebo (population): Use this for PKPD model fitting. Placebo subjects have zero drug concentration—they don’t inform PK parameters and can cause numerical issues during estimation.

1.4.3 What Just Happened? Input Validation

When you call read_pumas(), Pumas validates your data immediately:

  • ✓ Is time monotonically increasing within each subject?
  • ✓ Do all observation columns exist?

A comprehensive list is provided in the Data Representation Tutorial.

If anything is wrong, you get a clear error message pointing to the specific problem—before you waste hours on model fitting.

This is input validation at work: catch errors early when they’re easy to fix, not after overnight estimation runs.

1.4.4 Data Preparation Checklist ✅

Before moving to exploratory analysis, verify:

Next step: Now that our data is properly structured, let’s explore it visually to understand what we’re modeling.

1.5 A.2 (continued) Exploratory Data Analysis

Before fitting models, it’s essential to understand the structure and characteristics of the data. This corresponds to Stage 2 in the workflow map (Figure 1). Exploratory data analysis (EDA) helps identify potential issues (missing data, outliers, data entry errors), understand covariate distributions, and visualize concentration-time profiles.

1.5.0.1 Tables vs. Figures: DataFrames vs. Populations

Pumas provides flexible tools for EDA that work with both raw DataFrames and structured Population objects:

Summary Tables

  • Work with processed DataFrames (e.g., datasets prepared for non-compartmental or for compartmental analysis)
  • Created using SummaryTables.jl functions like overview_table(), table_one(), listingtable() and summarytable()
  • Ideal for demographic summaries, covariate distributions, and data quality checks

Visualizations

  • Can be created from either DataFrames or Population objects:

    • DataFrames: Use AlgebraOfGraphics.jl + CairoMakie.jl for full customization
    • Populations: Use PumasUtilities.jl plotting functions for pharmacometric-specific visualizations

PumasUtilities Plotting Functions

  • Work with both Population (NLME) and NCAPopulation (NCA) objects
  • Some functions are universal (e.g., observations_vs_time()), while others are analysis-specific (e.g., summary_observations_vs_time() for the case of NCA)
  • Built on top of AlgebraOfGraphics.jl, so plots can be easily extended and customized using the same syntax

Workflow Strategy

In this section, we demonstrate both approaches to explore PK and PD endpoints and understand covariate relationships. We’ll clearly indicate whether we’re using [DataFrame] or [Population] for each example, allowing you to choose the approach that best fits your workflow. Generally:

  • Use DataFrames when you need custom aggregations, filtering, or transformations before plotting
  • Use Populations when you want pharmacometric-specific plots with minimal code

1.5.1 Summary Tables

1.5.1.1 Overview Table [DataFrame]

One useful function to ensure the data was read correctly and to understand its structure is overview_table() from the SummaryTables.jl package:

Show/Hide Code
overview_table(wide_data)
Table 1
No Variable Stats / Values Freqs (% of Valid) Graph Valid Missing
1 ID
[String7]
1. 0_6
2. 1600_8
3. 100_5
4. 1600_7
5. 800_9
6. 800_10
7. 0_3
8. 0_1
9. 400_1
10. 0_7
[50 others]
33 (1.7%)
33 (1.7%)
33 (1.7%)
33 (1.7%)
33 (1.7%)
33 (1.7%)
33 (1.7%)
33 (1.7%)
33 (1.7%)
33 (1.7%)
1650 (83.3%)
1980
(100%)
0
(0%)
2 NOMTIME
[Float64]
Mean (sd): 80.2 (64)
min ≤ med ≤ max:
-0.1 ≤ 95.9 ≤ 216
IQR (CV): 112 (0.798)
33 distinct values 1980
(100%)
0
(0%)
3 PROFTIME
[Float64]
Mean (sd): 14.7 (21.6)
min ≤ med ≤ max:
0 ≤ 4 ≤ 95.9
IQR (CV): 23.8 (1.46)
14 distinct values 1980
(100%)
0
(0%)
4 PROFDAY
[Float64]
Mean (sd): 3.91 (2.56)
min ≤ med ≤ max:
0 ≤ 4 ≤ 9
IQR (CV): 5 (0.656)
10 distinct values 1980
(100%)
0
(0%)
5 AMT
[Float64]
Mean (sd): 93.9 (308)
min ≤ med ≤ max:
0 ≤ 0 ≤ 1600
IQR (CV): 0 (3.27)
6 distinct values 1980
(100%)
0
(0%)
6 EVID
[Int64]
Mean (sd): 0.182 (0.386)
min ≤ med ≤ max:
0 ≤ 0 ≤ 1
IQR (CV): 0 (2.12)
2 distinct values 1980
(100%)
0
(0%)
7 CMT
[String7?]
1. Depot 360 (100%) 360
(18.2%)
1620
(81.8%)
8 ROUTE
[String3]
1. ev 1980 (100%) 1980
(100%)
0
(0%)
9 CONC
[Float64?]
Mean (sd): 9.28 (13.6)
min ≤ med ≤ max:
0 ≤ 4.06 ≤ 110
IQR (CV): 10.1 (1.46)
947 distinct values 1620
(81.8%)
360
(18.2%)
10 PD_CONC
[Float64?]
Mean (sd): 72.1 (29.4)
min ≤ med ≤ max:
43.6 ≤ 58.8 ≤ 198
IQR (CV): 30.4 (0.407)
1005 distinct values 1140
(57.6%)
840
(42.4%)
11 SEX
[String7]
1. Female
2. Male
1056 (53.3%)
924 (46.7%)
1980
(100%)
0
(0%)
12 WEIGHTB
[Float64]
Mean (sd): 79.6 (5.89)
min ≤ med ≤ max:
69 ≤ 78.2 ≤ 93.2
IQR (CV): 8.67 (0.0739)
60 distinct values 1980
(100%)
0
(0%)
13 DOSE
[Int64]
Mean (sd): 517 (549)
min ≤ med ≤ max:
0 ≤ 300 ≤ 1600
IQR (CV): 700 (1.06)
6 distinct values 1980
(100%)
0
(0%)
14 TRTACT
[String7]
1. 200mg
2. Placebo
3. 400mg
4. 100mg
5. 1600mg
6. 800mg
330 (16.7%)
330 (16.7%)
330 (16.7%)
330 (16.7%)
330 (16.7%)
330 (16.7%)
1980
(100%)
0
(0%)
Dimensions: 1980 x 14
Duplicate rows: 0

Comprehensive data quality overview showing column types, summary statistics, unique value counts, distribution visualizations, and missing data proportions for all variables in the analysis dataset.

As we can see, it provides:

  • Column names and data types
  • Summary statistics (mean, median, min, max)
  • Number of unique values
  • Inline plots (sparklines) that depend on the column type
  • Proportion of missing values

This single function gives a comprehensive overview of the entire dataset, making it easy to spot issues like unexpected missing data, incorrect data types, or unusual value ranges.

1.5.1.2 Demographics Table [DataFrame]

For demographic and covariate information, we create stratified summaries. The table_one() function provides publication-ready summary tables stratified by grouping variables:

Show/Hide Code
table_one(
    unique(wide_data, :ID),
    [:WEIGHTB => "Baseline Weight [kg]", :SEX => "Sex [Female/Male]"],
    groupby=:SEX => "Sex",
    show_n=true
)
Table 2: Baseline demographic characteristics stratified by sex. Continuous variables presented as mean ± SD, categorical variables as n (%). Sample includes unique subjects from the analysis population.
Sex
Total
(n=60)
Female
(n=32)
Male
(n=28)
Baseline Weight [kg]
Mean (SD) 79.6 (5.93) 75.3 (2.67) 84.6 (4.65)
Median [Min, Max] 78.2 [69, 93.2] 75.8 [69, 79.7] 84 [74.9, 93.2]
Sex [Female/Male]
Female 32 (53.3%) 32 (100%) 0 (0%)
Male 28 (46.7%) 0 (0%) 28 (100%)

This produces a “Table 1” style summary commonly used in clinical publications, showing:

  • Sample sizes for each group
  • Mean ± SD for continuous variables (WEIGHTB)
  • Counts and percentages for categorical variables (SEX)
Advanced Tables (ⓐ)

For more customized summary tables, including tables stratified by multiple variables, custom summary functions, or complex formatting, see the documentation for SummaryTables.jl package that offers extensive customization options for publication-quality tables.

1.5.2 PK Data Visualization

Visual exploration of concentration-time profiles is essential for understanding PK characteristics including dose proportionality, accumulation, and between-subject variability.

1.5.2.1 Mean Concentration-Time Profiles by Dose [Population]

We begin by visualizing the average concentration-time profile for each dose level. This helps identify overall trends and assess whether the study design captured key PK features (absorption, distribution, elimination).

First, we need to load the plotting packages:

Show/Hide Code
using PumasUtilities
using AlgebraOfGraphics
using CairoMakie

1.5.2.2 PK Profiles by Day and Dose

For multiple ascending dose studies, comparing first dose versus steady-state profiles reveals drug accumulation patterns.

Show/Hide Code
# Create mean concentration profile by dose and time
df_mean = @chain wide_data begin
    @rsubset :DOSE != 0
    @by [:PROFDAY, :TRTACT, :PROFTIME] begin
        :Cmean = mean(:CONC)
        :Cmedian = median(:CONC)
    end
end
@rsubset! df_mean :PROFDAY  [1, 6]

# Plot using AlgebraOfGraphics
dose_renamer = renamer(map(d -> d => "$(Int(d)) mg", unique(wide_data.DOSE)))
day_renamer = renamer(map(d -> d => "Day $(Int(d))", unique(df_mean.PROFDAY)))
Show/Hide Code
# First dosing interval (e.g., Day 1: 0-24 hours)
plt_by_day = draw(data(@rsubset(df_mean, !ismissing(:Cmean))) *
                  mapping(:PROFTIME, :Cmean; color=:TRTACT => "Dose",
                      layout=:PROFDAY => day_renamer) *
                  visual(Lines, linewidth=6),
    axis=(;
        xlabel="Time (hours)",
        ylabel="Mean Concentration (ng/mL)",
        yscale=log10,
        ytickformat=x -> string.(round.(x; digits=1)),
        yticks=[0.1, 1, 2, 4, 10, 20, 40, 80, 100],
        ygridwidth=3,
        yminorgridcolor=:lightgrey,
        yminorticksvisible=true,
        yminorgridvisible=true,
        yminorticks=IntervalsBetween(10),
    ),
    legend=(; position=:bottom),
    figure=(; size=(1600, 800), fontsize=28)
)
Figure 1: Mean plasma concentration-time profiles comparing Day 1 (first dose) and Day 6 (approaching steady state) across dose levels. Log-scale y-axis enables comparison of elimination phases. Accumulation is evident as higher Day 6 concentrations relative to Day 1 at equivalent times post-dose.

or one can change the visualization to observe different doses in each panel and the accumulation as a group

Show/Hide Code
# First dosing interval (e.g., Day 1: 0-24 hours)
plt_by_dose = draw(data(@rsubset(df_mean, !ismissing(:Cmean))) *
                   mapping(:PROFTIME, :Cmean; color=:PROFDAY => day_renamer,
                       layout=:TRTACT => "Dose") *
                   visual(Lines, linewidth=6),
    axis=(;
        xlabel="Time (hours)",
        ylabel="Mean Concentration (ng/mL)",
        yscale=log10,
        ytickformat=x -> string.(round.(x; digits=1)),
        yticks=[0.1, 1, 2, 4, 10, 20, 40, 80, 100],
        ygridwidth=3,
        yminorgridcolor=:lightgrey,
        yminorticksvisible=true,
        yminorgridvisible=true,
        yminorticks=IntervalsBetween(5),
    ),
    legend=(; position=:bottom),
    figure=(; size=(1600, 1200), fontsize=28)
)
Figure 2: Mean concentration-time profiles faceted by dose level, comparing Day 1 versus Day 6 within each panel. This layout emphasizes accumulation patterns at each dose while maintaining dose-level comparisons. Color distinguishes study days within each dose panel.

Comparing these two panels helps identify accumulation—if steady-state concentrations are substantially higher than first-dose concentrations, the drug is accumulating.

1.5.2.3 Individual Variability: Spaghetti Plots

Spaghetti plots reveal between-subject variability by overlaying individual profiles (gray) with population mean (red).

Show/Hide Code
# Add identifier for distinguishing data vs mean
df_mean = @rsubset df_mean !ismissing(:Cmean)
df_conc = @chain copy(wide_data) begin
    @rsubset :DOSE != 0
    @rsubset !ismissing(:CONC)
    @rsubset :PROFDAY  [1, 6]
end
df_conc.source .= "Individual"
df_mean.source .= "Mean"

# Create layers
individual_layer = data(df_conc) *
                   mapping(:PROFTIME, :CONC;
                       group=:ID => nonnumeric,
                       row=:PROFDAY => day_renamer,
                       col=:TRTACT => "Dose") *
                   visual(Lines; color=(:gray, 0.6), linewidth=0.7)

mean_layer = data(df_mean) *
             mapping(:PROFTIME, :Cmean;
                 row=:PROFDAY => day_renamer,
                 col=:TRTACT => "Dose") *
             visual(Lines; color=:red, linewidth=4)

plt_spgt_mean = draw(
    individual_layer + mean_layer;
    axis=(;
        xlabel="Time (hours)",
        ylabel="Concentration (ng/mL)",
        yscale=log10,
        ytickformat=x -> string.(round.(x; digits=1)),
        yticks=[0.1, 1, 2, 4, 10, 20, 40, 60, 100],
        ygridwidth=1,
        yminorgridcolor=(:lightgrey, 0.5),
        yminorticksvisible=true,
        yminorgridvisible=true,
        yminorticks=IntervalsBetween(2),
        xticks=0:4:24,
        limits=(nothing, 24, nothing, 100),
    ),
    figure=(; size=(1600, 1200), fontsize=28),
)
# save("./figures/fig2-concvstime.png", plt_spgt_mean; px_per_unit = 2)
plt_spgt_mean
Figure 3: Individual concentration-time profiles (gray, semi-transparent) overlaid with mean profile (red, bold) for each dose level and study day. Between-subject variability is visually apparent in the spread of individual trajectories. Profiles are faceted by dose (columns) and day (rows) to assess both dose-response and temporal patterns.

This plot clearly shows the variability across subjects (gray lines) and how the mean profile (red line) represents the central tendency.

1.5.2.4 Individual Subject Panels

For detailed examination of individual profiles, the observations_vs_time() function creates separate panels for each subject, useful for identifying outliers or unusual PK patterns.

Show/Hide Code
using QuartoTools: Tabset

function ovst(pop)
    observations_vs_time(
        pop;
        observations=[:CONC],
        paginate=true,
        separate=true,
        figure=(; size=(1200, 900), fontsize=24),
        axis=(;
            xlabel="Time (hours)",
            ylabel="Concentration (ng/mL)",
            ygridwidth=1,
            yminorgridcolor=(:lightgrey, 0.5),
            yminorticksvisible=true,
            yminorgridvisible=true,
            yminorticks=IntervalsBetween(2),
        ),
        legend=(; position=:bottom)
    )
end

obs_plots = ovst(population)

# Display all using Tabset
Tabset([
    "Panel $(id)" => plot for (id, plot) in enumerate(obs_plots)
])
Figure 4

Setting paginate = true splits subjects across multiple figure pages, which is essential for datasets with many subjects (>9). Each page can be accessed via indexing (obs_plots[1], obs_plots[2], etc.).

Advanced Visualization (ⓐ)

While we demonstrate basic plotting the tutorials in tutorials.pumas.ai provide comprehensive examples using AlgebraOfGraphics.jl and CairoMakie.jl for:

  • Publication-quality figures with custom themes
  • Complex faceting and stratification
  • Interactive plots for exploratory analysis
  • Visual predictive checks and diagnostic plots

1.5.3 PD Data Visualization

Having characterized PK profiles, we now explore pharmacodynamic endpoints to understand exposure-response relationships, temporal dynamics, and patient variability in drug response.

1.5.3.1 Pharmacodynamic Data Exploration

In addition to characterizing PK profiles, exploratory analysis of pharmacodynamic endpoints is essential for understanding the exposure-response relationship. The wide_data DataFrame already contains matched PK (:CONC) and PD (:PD_CONC) observations at the same timepoints, making it straightforward to explore:

  1. PD time-course profiles - How does the biomarker change over time and across doses?
  2. Exposure-response relationships - How does PD relate to PK concentration?
  3. Hysteresis and delay - Is there a temporal lag between drug concentration and effect?
  4. Covariate effects - Do patient characteristics modify the exposure-response relationship?

1.5.3.2 PD Time-Course by Dose [DataFrame]

First, let’s create a summary DataFrame for mean PD profiles by dose and time:

Show/Hide Code
# Calculate mean PD metrics by dose and time
df_pd_mean = @chain wide_data begin
    @rsubset :NOMTIME >= 0
    @rsubset :PROFDAY >= 6
    @rsubset !ismissing(:PD_CONC) && :EVID == 0
    @by [:DOSE, :PROFDAY, :PROFTIME] begin
        :PD_mean = mean(:PD_CONC)
        :PD_median = median(:PD_CONC)
        :PD_std = std(:PD_CONC)
        :PD_n = length(:PD_CONC)
        :PD_sem = std(:PD_CONC) / sqrt(length(:PD_CONC))
        :PD_time = median(:PROFTIME)
    end
    @rtransform begin
        :lci = :PD_mean - 1.96 * (:PD_sem)
        :uci = :PD_mean + 1.96 * (:PD_sem)
    end
end

Now we can visualize the mean PD time-course profiles colored by dose:

Show/Hide Code
# Mean PD profiles with 95% CI ribbons by dose (Day 6)
plt_pd_time_mean = data(df_pd_mean) *
                   mapping(
                       :PD_time => "Time (Days)",
                       :PD_mean => "Mean PD Response (IU/L)";
                       color=:DOSE => dose_renamer => "Dose",
                   ) *
                   (visual(Lines, linewidth=4))
plt_pd_time_ci = data(df_pd_mean) *
                 mapping(:PD_time => "Time (Days)",
                     #  :PD_mean => "Mean PD Response (IU/L)",
                     :lci => "Lower CI",
                     :uci => "Upper CI",
                     color=:DOSE => dose_renamer => "Dose") *
                 visual(Band, alpha=0.4);

draw(plt_pd_time_mean + plt_pd_time_ci;
    axis=(;
        xlabel="Time (Days)",
        ylabel="Mean PD Response (IU/L)",
    ),
    legend=(; position=:bottom),
    figure=(; size=(1200, 700), fontsize=24)
)
Figure 5: Mean pharmacodynamic response profiles over time at steady state (Day 6 onward) stratified by dose level. Lines represent mean response with 95% confidence intervals (shaded bands). Dose-dependent increase in response magnitude and duration demonstrates clear exposure-response relationship. Return to baseline after final dose indicates reversibility of effect.

These plots reveal the overall shape of the PD time-course and help identify:

  • Onset of effect (time to peak response)
  • Duration of effect (return to baseline)
  • Dose-response relationship (higher doses → larger effects)
  • Accumulation (Day 1 vs Day 6 profiles)

1.5.3.3 Individual PD Variability

Spaghetti plots reveal between-subject variability in PD response magnitude and time-course.

Show/Hide Code
# Prepare data for individual PD plots
df_pd_individual = @chain wide_data begin
    @rsubset begin
        !ismissing(:PD_CONC)
        :EVID == 0
        :PROFDAY >= 6
        :NOMTIME >= 0

    end
    dropmissing(:PD_CONC)
end

# Individual layers
individual_pd_layer = data(df_pd_individual) *
                      mapping(
                          :PROFTIME => "Time (hours)",
                          :PD_CONC => "PD Response (IU/L)";
                          group=:ID => nonnumeric,
                          #   row = :PROFDAY => nonnumeric,# => day_renamer, # with this the plots are empty
                          layout=:DOSE => dose_renamer => "Dose"
                      ) *
                      visual(Lines; color=(:gray, 1.0), linewidth=0.7)

# Mean overlay
mean_pd_layer = data(df_pd_mean) *
                mapping(
                    :PD_time => "Time (hours)",
                    :PD_mean => "PD Response (IU/L)";
                    #        row = :PROFDAY => nonnumeric,# => day_renamer,# with this the plots are empty
                    layout=:DOSE => dose_renamer => "Dose"
                ) *
                visual(Lines; color=:red, linewidth=4)

draw(
    individual_pd_layer + mean_pd_layer;
    axis=(;
        xlabel="Time (hours)",
        ylabel="PD Response (IU/L)",
        xticks=0:48:196,
    ),
    figure=(; title="Day 6 PD profiles", size=(1600, 1200), fontsize=28)
)
Figure 6: Individual pharmacodynamic response trajectories (gray lines) overlaid with population mean (red line) for each dose level at steady state. Between-subject variability in both baseline response and drug-induced stimulation is evident. Subjects with higher baseline levels show proportionally greater absolute response increases, suggesting potential for covariate effects in PD modeling.

The spaghetti plots clearly show:

  • Individual subject variability (gray lines)
  • The mean trend (red line)
  • Whether variability increases with dose or time
  • Outlier subjects with unusual PD responses

1.5.4 Exposure-Response Relationships

1.5.4.1 Concentration-Response Correlation [DataFrame]

A key pharmacometric question is: How does PD response relate to drug concentration? Since wide_data already has matched PK and PD observations, we can directly explore this relationship.

At Steady State (Day 6):

Show/Hide Code
# Filter to steady state observations with both PK and PD
df_er_ss = @chain wide_data begin
    @rsubset begin
        # :PROFDAY > 5
        :NOMTIME >= 0
        !ismissing(:CONC) && !ismissing(:PD_CONC)
        :EVID == 0
        :CONC > 0  # Exclude pre-dose samples
    end
end

dose_renamer = renamer(map(d -> d => "$(Int(d)) mg", unique(df_er_ss.DOSE)))
# Scatter plot with dose coloring
plt_er_ss = draw(
    data(df_er_ss) *
    mapping(
        :CONC => "PK Concentration (ng/mL)",
        :PD_CONC => "PD Response (IU/L)";
        color=:DOSE => dose_renamer => "Dose"
    ) *
    visual(Scatter, markersize=12, alpha=0.6);
    axis=(;
        xlabel="PK Concentration (ng/mL)",
        ylabel="PD Response (IU/L)",
        xscale=log10,
        xtickformat=x -> string.(round.(x; digits=1)),
        xticks=[1, 2, 5, 10, 20, 50, 100],
    ),
    legend=(; position=:bottom),
    figure=(; size=(1000, 800), fontsize=24)
)
Figure 7: Exposure-response relationship at steady state showing PD response plotted against PK concentration on a semi-logarithmic scale. Each point represents a matched PKPD observation, colored by dose level. Monotonic increase in response with concentration suggests saturable stimulation mechanism. Overlap between dose groups at similar concentrations supports concentration-driven (rather than dose-driven) response.

This plot reveals:

  • Whether PD increases monotonically with concentration
  • Evidence of saturation (plateau at high concentrations)
  • Dose-proportionality of the exposure-response relationship

Binned Concentration-Response Analysis:

Binning concentrations and calculating mean PD response clarifies the central tendency obscured by individual variability.

Show/Hide Code
# Create concentration bins and calculate mean PD
using Statistics

# Define bins on log scale
conc_bins = [0, 2, 5, 10, 20, 50, 100, 200]
df_er_ss_binned = @chain df_er_ss begin
    @rtransform :CONC_bin = findfirst(x -> :CONC < x, conc_bins[2:end])
    @by :CONC_bin begin
        :CONC_mean = mean(:CONC)
        :PD_mean = mean(:PD_CONC)
        :PD_sem = std(:PD_CONC) / sqrt(length(:PD_CONC))
        :n = length(:PD_CONC)
    end
    @rsubset !ismissing(:CONC_bin)
end

# Add binned statistics to the plot
plt_er_ss_binned = draw(
    data(df_er_ss) *
    mapping(
        :CONC => "PK Concentration (ng/mL)",
        :PD_CONC => "PD Response (IU/L)";
        color=:DOSE => nonnumeric,# => dose_renamer => "Dose"
    ) *
    visual(Scatter, markersize=10, alpha=0.4) +
    data(df_er_ss_binned) *
    mapping(
        :CONC_mean => "PK Concentration (ng/mL)",
        :PD_mean => "PD Response (IU/L)"
    ) *
    visual(Scatter, color=:black, marker=:rect, markersize=16) +
    data(df_er_ss_binned) *
    mapping(
        :CONC_mean => "PK Concentration (ng/mL)",
        :PD_mean => "PD Response (IU/L)",
        :PD_sem => "Standard Error"
    ) *
    visual(Errorbars, color=:black, linewidth=3, whiskerwidth=5);
    axis=(;
        xlabel="PK Concentration (ng/mL)",
        ylabel="PD Response (IU/L)",
        xscale=log10,
        xtickformat=x -> string.(round.(x; digits=1)),
        xticks=[1, 2, 5, 10, 20, 50, 100],
    ),
    legend=(; position=:bottom),
    figure=(; size=(1000, 800), fontsize=24)
)
# save("./figures/fig5-errelation.png", plt_er_ss_binned; px_per_unit = 2)
plt_er_ss_binned
Figure 8: Exposure-response relationship with binned statistics. Individual observations (small colored points) are overlaid with concentration bin means (large black squares) and standard errors (black whiskers). Binned data clearly show steep sigmoid concentration-response curve characteristic of Emax models with Hill coefficient, supporting indirect response model structure with inhibition of kout.

Temporal Stability of Exposure-Response:

Faceting by study day reveals whether the exposure-response relationship changes over time (tolerance or sensitization).

Show/Hide Code
# Filter to all days with PK and PD observations
df_er_time = @chain wide_data begin
    @rsubset begin
        !ismissing(:CONC) && !ismissing(:PD_CONC)
        :EVID == 0
        :CONC > 0
    end
end
day_renamer_pd = renamer(map(d -> d => "Day $(Int(d))", unique(df_er_time.PROFDAY)))
plt_er_by_day = draw(
    data(df_er_time) *
    mapping(
        :CONC => "PK Concentration (ng/mL)",
        :PD_CONC => "PD Response (IU/L)";
        color=:DOSE => dose_renamer => "Dose",
        layout=:PROFDAY => day_renamer_pd => "Study Day"
    ) *
    visual(Scatter, markersize=10, alpha=0.6);
    axis=(;
        xlabel="PK Concentration (ng/mL)",
        ylabel="PD Response (IU/L)",
        xscale=log10,
        xtickformat=x -> string.(round.(x; digits=1)),
    ),
    legend=(; position=:bottom),
    figure=(; size=(1400, 600), fontsize=24)
)
Figure 9: Exposure-response relationships stratified by study day. Each panel shows PD response versus PK concentration for a specific day of treatment. Consistent exposure-response relationship across days (no systematic panel-to-panel shift) indicates absence of tolerance or sensitization, supporting time-invariant PD parameters in subsequent modeling.

This faceted view helps identify:

  • Tolerance development (same concentration → less effect over time)
  • Sensitization (same concentration → more effect over time)
  • Time-invariant exposure-response (no change across days)

1.5.4.2 Hysteresis Analysis

Hysteresis plots reveal temporal delays between concentration and effect by plotting PD versus PK as time-ordered trajectories.

Interpreting Hysteresis Patterns:

  • Clockwise loops: Effect lags concentration (equilibration delay to effect site)
  • Counter-clockwise loops: Effect precedes concentration (tolerance or indirect mechanism)
  • Straight lines: Rapid equilibration (direct response model appropriate)
Show/Hide Code
# Prepare data for hysteresis plots
df_hysteresis = @chain wide_data begin
    @rsubset begin
        !ismissing(:CONC) && !ismissing(:PD_CONC)
        :EVID == 0
        #    :PROFDAY == 1  # Focus on steady state
        :DOSE > 0
    end
    @orderby :ID :NOMTIME  # Ensure temporal ordering
end

# Create subject-specific trajectories with arrows
# Note: For simplicity, we'll show a subset of subjects
subject_subset = unique(df_hysteresis.ID)[1:20]

plt_hysteresis = draw(
    data(@rsubset(df_hysteresis, :ID  subject_subset)) *
    mapping(
        :CONC => "PK Concentration (ng/mL)",
        :PD_CONC => "PD Response (IU/L)";
        color=:NOMTIME => "Time (hours)",
        layout=:ID => renamer(map(d -> d => "ID: $(d)", subject_subset)),
    ) *
    visual(Lines, linewidth=2);
    axis=(;
        xlabel="PK Concentration (ng/mL)",
        ylabel="PD Response (IU/L)",
        xscale=log10,
        xtickformat=x -> string.(round.(x; digits=1)),
    ),
    figure=(; size=(2000, 2000), fontsize=18)
)
Figure 10: Hysteresis plots showing PKPD trajectories over time for individual subjects (subset shown). Each line traces one subject’s time course through concentration-response space, colored by time. Hysteresis loops are evident, characteristic of the effect compartment delay and indirect response mechanisms. The combination of effect compartment (introducing time lag) and Kout inhibition creates pronounced hysteresis supporting the model structure.

1.5.4.3 Covariate Effects on Exposure-Response

We investigate whether demographic characteristics modify the exposure-response relationship, which would inform covariate inclusion in PD modeling.

Show/Hide Code
# Exposure-response stratified by sex
df_er_sex = @chain wide_data begin
    @rsubset begin
        :PROFDAY > 1
        !ismissing(:CONC) && !ismissing(:PD_CONC)
        :EVID == 0
        :CONC > 0
        :DOSE > 0
    end
end

plt_er_by_sex = draw(
    data(df_er_sex) *
    mapping(
        :CONC => "PK Concentration (ng/mL)",
        :PD_CONC => "PD Response (IU/L)";
        color=:SEX => "Sex",
        col=:SEX => "Sex"
    ) *
    visual(Scatter, markersize=12, alpha=0.6);
    axis=(;
        xlabel="PK Concentration (ng/mL)",
        ylabel="PD Response (IU/L)",
        xscale=log10,
        xtickformat=x -> string.(round.(x; digits=1)),
    ),
    legend=(; position=:bottom),
    figure=(; size=(1400, 600), fontsize=24)
)
Figure 11: Exposure-response relationships stratified by sex. Separate panels for males and females enable comparison of both intercepts (baseline response) and slopes (drug sensitivity). Similar exposure-response relationships between sexes suggest sex may not be a significant PD covariate, though formal covariate testing in the population PD model is required for definitive conclusions.

This analysis helps determine:

  • Whether males and females have different sensitivity (slope) to the drug
  • Whether baseline PD differs between sexes (intercept shift)
  • Whether covariate modeling will be necessary in subsequent PKPD model development

Summary: Exploratory Data Analysis

In this section, we completed comprehensive visualization of PK and PD data:

  • Summary tables confirmed data quality and demographic balance
  • PK profiles revealed dose proportionality and accumulation patterns
  • PD profiles showed dose-dependent stimulation with reversible effects
  • Exposure-response analysis indicated saturable, time-invariant PD mechanism
  • Hysteresis patterns supported indirect response model structure with effect compartment

In this section, we’ve completed Stages 1-2 of the workflow map:

Stage 1: Data Preparation

  • ✓ Loaded data from CSV using CSV.read()
  • ✓ Created a population type for different analyses:
    • population / population_placebo: Population objects using read_pumas() for NLME modeling

Stage 2: Exploratory Data Analysis

  • ✓ Generated summary tables with overview_table() and table_one()
  • ✓ Visualized concentration-time profiles and covariate distributions

Having explored the data visually and confirmed its quality, we begin quantitative analysis with non-compartmental methods. NCA provides model-independent exposure metrics that inform dose selection and serve as initial parameter estimates for subsequent population PK modeling.

1.6 A.3 Non-Compartmental Analysis

1.6.1 NCA Fundamentals

Non-compartmental analysis estimates PK parameters without assuming a specific compartmental structure. Instead, it uses numerical integration (trapezoidal rule) and statistical moments to derive exposure metrics. Key advantages of NCA include:

  • Model-independence: No assumptions about compartments or absorption/elimination processes
  • Speed: Fast computation, suitable for large datasets
  • Regulatory acceptance: Widely used for bioequivalence and dose-proportionality studies
  • Initial estimates: Provides starting values for population PK models

Common NCA parameters include:

  • AUC (Area Under the Curve): Total drug exposure
  • Cmax: Maximum observed concentration
  • Tmax: Time of maximum concentration
  • : Terminal elimination half-life
  • CL/F: Apparent clearance (dose/AUC)
  • Vz/F: Apparent volume of distribution

For multiple-dose studies, we also assess:

  • Accumulation ratio: Ratio of steady-state to first-dose exposure
  • Dose proportionality: Linearity of exposure with increasing dose

In the previous section, we already covered the the creation of a NCAPopulation and used that object to understand basic properties of the data such as profiles over day and across doses, extent of variability by visualizing spaghetti plots and finally accumulation from first dose to steady state via the group profiles over days across different doses. One thing that was noticed from the plots in the previous section was that rich concentration time profiles are only available on days 1 and 6 and the remaining days have only one sample. NCA is desirable over full profiles and hence we created an NCAPopulation using only PROFDAYs 1 and 6.

Next we directly jump into computation of the NCA parameters.

1.6.2 Calculate NCA Parameters

To calculate all standard NCA parameters, use run_nca():

Show/Hide Code
nca_report = run_nca(population_nca)

This returns an NCAReport object containing:

  • reportdf: DataFrame with all calculated parameters for each subject
  • Parameter-specific methods: Access individual parameters like nca_report.auc, nca_report.cmax, etc.

1.6.2.1 Viewing the Report

Show/Hide Code
nca_report_df = DataFrame(nca_report.reportdf)
first(nca_report_df, 10)
10×40 DataFrame
Row id PROFDAY DOSE dose tlag tmax cmax tlast clast clast_pred auclast kel half_life aucinf_obs aucinf_pred vz_f_obs cl_f_obs vz_f_pred cl_f_pred n_samples cmax_dn auclast_dn aucinf_dn_obs auc_extrap_obs aucinf_dn_pred auc_extrap_pred aumclast aumcinf_obs aumc_extrap_obs aumcinf_pred aumc_extrap_pred n_samples_kel rsq_kel rsq_adj_kel corr_kel intercept_kel kel_t_low kel_t_high span route
String Float64 Int64 Float64 Missing Float64 Float64 Float64 Float64 Float64? Float64 Float64? Float64? Float64? Float64? Float64? Float64? Float64? Float64? Int64 Float64 Float64 Float64? Float64? Float64? Float64? Float64 Float64? Float64? Float64? Float64? Int64? Float64? Float64? Float64? Float64? Float64? Float64? Float64? String
1 100_1 1.0 100 100.0 missing 1.0 2.65 23.9 0.57 0.559479 21.2625 0.0263839 26.2716 42.8666 42.4678 88.4183 2.33282 89.2486 2.35472 9 0.0265 0.212625 0.428666 50.3985 0.424678 49.9327 195.892 1531.07 87.2055 1506.42 86.9962 3 0.960105 0.920211 0.97985 0.049825 12.0 23.9 0.45296 EV
2 100_10 1.0 100 100.0 missing 1.0 2.45 23.9 1.35 1.16566 35.773 0.0228455 30.3406 94.8656 86.7966 46.1414 1.05412 50.4309 1.15212 9 0.0245 0.35773 0.948656 62.2909 0.867966 58.7852 387.303 4386.23 91.17 3840.19 89.9145 6 0.59836 0.49795 0.773537 0.699294 2.0 23.9 0.721804 EV
3 100_2 1.0 100 100.0 missing 1.0 1.86 23.9 0.51 0.385273 18.7815 0.056208 12.3318 27.8549 25.6359 63.8704 3.59003 69.399 3.90078 9 0.0186 0.187815 0.278549 32.5739 0.256359 26.7376 166.798 545.08 69.3993 452.566 63.1439 6 0.809214 0.761517 0.899563 0.389567 2.0 23.9 1.77589 EV
4 100_3 1.0 100 100.0 missing 2.0 1.89 23.9 0.69 0.653744 23.674 0.0229665 30.1808 53.7178 52.1392 81.0565 1.86158 83.5107 1.91794 9 0.0189 0.23674 0.537178 55.929 0.521392 54.5946 230.034 2256.24 89.8045 2149.77 89.2996 4 0.853196 0.779793 0.923686 0.123859 8.0 23.9 0.526824 EV
5 100_4 1.0 100 100.0 missing 0.5 3.03 23.9 0.64 0.565717 23.898 0.0346612 19.9978 42.3625 40.2194 68.1044 2.36058 71.7334 2.48637 9 0.0303 0.23898 0.423625 43.5868 0.402194 40.5808 213.163 1187.18 82.0446 1074.13 80.1548 5 0.833651 0.778201 0.913045 0.258742 4.0 23.9 0.99511 EV
6 100_5 1.0 100 100.0 missing 1.0 2.13 23.9 0.71 0.703949 23.896 0.0129469 53.5378 78.7355 78.2682 98.099 1.27007 98.6847 1.27766 9 0.0213 0.23896 0.787355 69.6503 0.782682 69.4691 232.416 5778.82 95.9781 5731.55 95.945 4 0.988631 0.982946 0.994299 -0.0416186 8.0 23.9 0.296986 EV
7 100_6 1.0 100 100.0 missing 2.0 1.01 23.9 0.43 0.355733 12.577 0.0293419 23.6231 27.2318 24.7007 125.151 3.67218 137.975 4.04847 9 0.0101 0.12577 0.272318 53.815 0.247007 49.0824 125.045 974.744 87.1715 827.989 84.8977 5 0.593508 0.458011 0.770395 -0.332302 4.0 23.9 0.842396 EV
8 100_7 1.0 100 100.0 missing 1.0 2.12 23.9 0.62 0.598411 22.235 0.028518 24.3056 43.9756 43.2186 79.7386 2.27399 81.1353 2.31382 9 0.0212 0.22235 0.439756 49.4379 0.432186 48.5522 214.385 1496.33 85.6727 1451.69 85.2321 3 0.885965 0.77193 0.941257 0.168103 12.0 23.9 0.489599 EV
9 100_8 1.0 100 100.0 missing 0.5 3.45 23.9 0.35 0.336418 16.8565 0.0273938 25.3031 29.6331 29.1373 123.189 3.3746 125.285 3.43203 9 0.0345 0.168565 0.296331 43.116 0.291373 42.148 125.737 897.505 85.9903 867.555 85.5067 5 0.967016 0.956022 0.98337 -0.43469 4.0 23.9 0.786465 EV
10 100_9 1.0 100 100.0 missing 0.5 2.04 23.9 1.01 0.802494 26.404 0.0303423 22.8443 59.6909 52.852 55.2133 1.6753 62.3577 1.89208 9 0.0204 0.26404 0.596909 55.7654 0.52852 50.0416 279.781 2172.38 87.121 1783.54 84.3132 7 0.578239 0.493887 0.76042 0.50515 1.0 23.9 1.00244 EV

Common parameters in the report include:

  • lambdaz: Terminal elimination rate constant
  • half_life: Terminal half-life (t½)
  • tmax: Time of maximum concentration
  • cmax: Maximum concentration
  • auclast: AUC from time zero to last measurable concentration
  • aucinf_obs: AUC from time zero to infinity (observed)
  • vz_f_obs: Apparent volume of distribution (Vz/F)
  • cl_f_obs: Apparent clearance (CL/F)
Selecting Specific Parameters

You can specify which parameters to calculate:

Show/Hide Code
nca_report_custom = run_nca(
    population_nca;
    parameters=["aucinf_obs", "auclast", "cmax", "tmax", "half_life"]
)

See the Pumas NCA documentation for the full list of available parameters.

1.6.3 Assess Dose Proportionality

Dose proportionality analysis evaluates whether exposure increases linearly with dose. We use dose-normalized AUC and Cmax as primary metrics.

1.6.3.1 Visualizing Dose Proportionality

If PK is dose-proportional, dose-normalized AUC and Cmax should be independent of dose (i.e., constant across dose levels).

Show/Hide Code
using AlgebraOfGraphics, CairoMakie
# Extract AUC and Cmax from NCA report, removing missing values
nca_summary = @select nca_report_df :id :PROFDAY :DOSE :cmax_dn :aucinf_dn_obs
dropmissing!(nca_summary)

fig_dose_prop = Figure(; size=(800, 800))

day_renamer = renamer(map(d -> d => "Day $(Int(d))", unique(nca_summary.PROFDAY)))
# AUC vs Dose
plt_auc_dose = data(nca_summary) *
               mapping(:DOSE => nonnumeric, :aucinf_dn_obs;
                   layout=:PROFDAY => day_renamer) *
               visual(Violin; show_median=true)

draw!(
    fig_dose_prop[1, 1],
    plt_auc_dose;
    axis=(;
        xlabel="Dose (mg)",
        ylabel="Dose-Normalized AUC₀₋∞ (ng·hr/mL/mg)",
        #title = "AUC Dose Proportionality",
    ),
)

# Cmax vs Dose
plt_cmax_dose = data(nca_summary) *
                mapping(:DOSE => nonnumeric, :cmax_dn;
                    layout=:PROFDAY => day_renamer) *
                visual(Violin; show_median=true)

draw!(
    fig_dose_prop[2, 1],
    plt_cmax_dose;
    axis=(;
        xlabel="Dose (mg)",
        ylabel="Dose-Normalized Cmax (ng/mL/mg)",
        # title = "Cmax Dose Proportionality",
    ),
)

fig_dose_prop
Figure 12: Dose proportionality assessment using dose-normalized exposure metrics. Top panel shows dose-normalized AUC₀₋∞, bottom panel shows dose-normalized Cmax, both stratified by study day. Violin plots display distribution of normalized values across subjects at each dose level. Approximately constant normalized values across doses (flat distributions) indicate dose-proportional pharmacokinetics. Median shown as horizontal line within each violin.

Interpretation:

  • Flat distributions across doses: Dose-proportional PK
  • Increasing trend: Less than dose-proportional (possibly saturable absorption or increased clearance)
  • Decreasing trend: More than dose-proportional (possibly saturable clearance)

1.6.3.2 Statistical Assessment of Dose Proportionality

For formal statistical testing, the power model is commonly used:

log(Parameter) ~ log(α) + β * log(Dose)

Where:

  • β = 1: Dose-proportional
  • β < 1: Less than dose-proportional
  • β > 1: More than dose-proportional

This analysis is typically performed using linear regression on log-transformed data. Below is the example just on cmax.

Show/Hide Code
dp_cmax = NCA.DoseLinearityPowerModel(nca_report, :cmax)
Dose Linearity Power Model
Variable: cmax
Model: log(cmax) ~ log(α) + β × log(dose)
────────────────────────────────────
   Estimate  low CI 95%  high CI 95%
────────────────────────────────────
β   1.01617    0.921403      1.11094
────────────────────────────────────

Here’s a visualization for the dose linearity using a power model for cmax

Show/Hide Code
power_model(dp_cmax; legend=(; position=:bottom))

1.6.4 Summary: Non-Compartmental Analysis

In this section, we demonstrated NCA workflows for multiple ascending dose studies:

NCA Population Creation (ⓐ)

  • Used read_nca() to create NCAPopulation objects
  • Grouped by dose level and covariates for stratified analysis

Dose-Normalized Concentration Analysis

  • Visualized profiles to assess dose linearity
  • Compared first dose vs. steady state

Parameter Calculation (ⓐ)

  • Used run_nca() for comprehensive NCA reports

Dose Proportionality Assessment

  • Evaluated dose-normalized AUC and Cmax across dose levels
  • Used violin plots to visualize variability and trends
  • Performed statistical assessment of dose proportionality

NCA provides rapid, model-independent PK characterization and can inform:

  1. Dose selection for later-stage studies
  2. Initial parameter estimates for population PK models (next section)
  3. Regulatory submissions (bioequivalence, dose proportionality)
  4. Early decision-making in drug development

1.7 A.4 Population Pharmacokinetic Modeling

Having characterized exposure metrics via NCA, we now build mechanistic compartmental models to describe concentration-time data. This section demonstrates the complete model development workflow: base model fitting, diagnostics, model comparison, random effect refinement, covariate selection, and parameter uncertainty quantification.

🗺️ Modeling Journey (Stages 4-8)

This section covers multiple workflow stages:

  • Stage 4: Base model specification
  • Stage 5: Model fitting and initial diagnostics
  • Stage 6: Model comparison and refinement
  • Stage 7: Covariate modeling
  • Stage 8: Final model validation

Each stage builds on the previous, demonstrating iterative model development.

1.7.1 One-Compartment Base Model

1.7.1.1 Define Model Structure

1.7.1.2 One-Compartment Oral Model

Show/Hide Code
oral_model = @model begin
    @metadata begin
        desc = "One-compartment oral absorption model"
        timeu = u"hr" # hour
    end

    @param begin
        θka  RealDomain(lower=0.1)
        θcl  RealDomain(lower=0.1)
        θvc  RealDomain(lower=1.0, upper=500)

        Ω_pk  PDiagDomain(3)

        σ_add_pk  RealDomain(lower=0.0)
        σ_prop_pk  RealDomain(lower=0.0)
    end

    @random begin
        η ~ MvNormal(Ω_pk)
    end

    @pre begin
        ηka, ηcl, ηvc = η
        Ka = θka * exp(ηka)
        CL = θcl * exp(ηcl)
        Vc = θvc * exp(ηvc)
    end

    @dynamics Depots1Central1

    @derived begin
        ipred := @. Central / Vc
        CONC ~ @. Normal(ipred, sqrt(σ_add_pk^2 + (ipred * σ_prop_pk)^2))
    end
end
PumasModel
  Parameters: θka, θcl, θvc, Ω_pk, σ_add_pk, σ_prop_pk
  Random effects: η
  Covariates:
  Dynamical system variables: Depot, Central
  Dynamical system type: Closed form
  Derived: CONC
  Observed: CONC

1.7.1.3 Specify Initial Parameter Estimates

We use NCA-derived values to inform initial estimates for clearance and volume parameters.

Show/Hide Code
initial_est_oral_model = (
    θka=0.9,
    θcl=4.3,
    θvc=252.0,
    Ω_pk=Diagonal([0.1, 0.1, 0.1]),
    σ_add_pk=sqrt(0.09),
    σ_prop_pk=sqrt(0.01),
)
(θka = 0.9,
 θcl = 4.3,
 θvc = 252.0,
 Ω_pk = [0.1 0.0 0.0; 0.0 0.1 0.0; 0.0 0.0 0.1],
 σ_add_pk = 0.3,
 σ_prop_pk = 0.1,)

1.7.1.4 Validate Initial Parameters

Before fitting, we validate that initial parameters yield reasonable log-likelihood and that no subjects are highly influential.

Show/Hide Code
foce_loglikelihood = loglikelihood(oral_model, population, initial_est_oral_model, FOCE())
-5311.612884011242
Show/Hide Code
foce_fi = findinfluential(oral_model, population, initial_est_oral_model, FOCE())
50-element Vector{@NamedTuple{id::String, nll::Float64}}:
 (id = "1600_8", nll = 266.42908266386814)
 (id = "1600_2", nll = 228.20685400233194)
 (id = "1600_5", nll = 225.38105794476454)
 (id = "1600_1", nll = 203.29683299466527)
 (id = "800_6", nll = 202.6465624934891)
 (id = "800_1", nll = 189.5454835861801)
 (id = "800_7", nll = 188.6610414786809)
 (id = "1600_9", nll = 184.77508968321845)
 (id = "800_2", nll = 176.5944189955281)
 (id = "1600_6", nll = 169.93898812529602)
 ⋮
 (id = "200_2", nll = 43.37195275343636)
 (id = "200_10", nll = 40.68007340784086)
 (id = "100_3", nll = 38.53032098056971)
 (id = "100_5", nll = 38.37424787341113)
 (id = "200_5", nll = 36.651641107865856)
 (id = "100_7", nll = 32.69390797435821)
 (id = "100_2", nll = 29.7365955162019)
 (id = "200_1", nll = 29.23535715676298)
 (id = "100_6", nll = 7.419946574114995)

1.7.1.5 Prior Predictive Check

Simulate from the model using initial parameters to verify they generate plausible concentration profiles before fitting.

Show/Hide Code
pre_fit_sim = simobs(oral_model, population, initial_est_oral_model; rng)
Simulated population (Vector{<:Subject})
  Simulated subjects: 50
  Simulated variables: CONC
Show/Hide Code
using PumasUtilities
using CairoMakie
sim_plot(pre_fit_sim,
    color=(:grey, 0.5),
    linewidth=0.5,
    markercolor=(:black, 0.5),
    axis=(yscale=Makie.pseudolog10,
        xlabel="Time (hours)",
        ylabel="Mean Concentration (ng/mL)",
        xticks=0:24:240),
    legend=(; position=:bottom))
Figure 13: Prior predictive check showing simulated concentration-time profiles (gray lines) using initial parameter estimates before model fitting. This visualization verifies that starting values generate physiologically plausible profiles. Substantial mismatch would indicate need to revise initial estimates before attempting parameter estimation.

1.7.1.6 Fit Model Using Progressive Estimation

We use a two-stage approach: first fit using NaivePooled (fast) to get stable fixed effects, then refine with FOCE for full population model.

Show/Hide Code
np_fit = fit(
    oral_model,
    population,
    (; initial_est_oral_model..., Ω_pk=Diagonal([0.0, 0.0, 0.0])),
    NaivePooled();
    constantcoef=(:Ω_pk,),
    optim_options=(; show_trace=false),
    verbose=false,
)
Warning: Observation PD_CONC is not modeled by the model
@ Pumas none:4570
FittedPumasModel

Dynamical system type:                 Closed form

Number of subjects:                             50

Observation records:         Active        Missing
    CONC:                      1350              0
    PD_CONC:                    950            400
    Total:                     2300            400

Number of parameters:      Constant      Optimized
                                  1              7

Likelihood approximation:              NaivePooled
Likelihood optimizer:                         BFGS

Termination Reason:                   GradientNorm
Log-likelihood value:                   -3099.9729

------------------------
              Estimate
------------------------
  θka          6.7245
  θcl          1.3155
  θvc         85.528
† Ω_pk₁,₁      0.0
† Ω_pk₂,₂      0.0
† Ω_pk₃,₃      0.0
  σ_add_pk     0.015187
  σ_prop_pk    0.44841
------------------------
† indicates constant parameters
Show/Hide Code
foce_fit = fit(
    oral_model,
    population,
    (; coef(np_fit)..., Ω_pk=initial_est_oral_model.Ω_pk),
    FOCE();
    optim_options=(; show_trace=false),
    verbose=false,
)
Warning: Observation PD_CONC is not modeled by the model
@ Pumas none:4570
FittedPumasModel

Dynamical system type:                 Closed form

Number of subjects:                             50

Observation records:         Active        Missing
    CONC:                      1350              0
    PD_CONC:                    950            400
    Total:                     2300            400

Number of parameters:      Constant      Optimized
                                  0              8

Likelihood approximation:                     FOCE
Likelihood optimizer:                         BFGS

Termination Reason:              NoObjectiveChange
Log-likelihood value:                   -2589.0529

-----------------------
            Estimate
-----------------------
θka          5.7837
θcl          1.3839
θvc         82.823
Ω_pk₁,₁      1.2789e-7
Ω_pk₂,₂      0.093668
Ω_pk₃,₃      0.086028
σ_add_pk     0.015318
σ_prop_pk    0.28534
-----------------------

1.7.2 Model Diagnostics

With the base model fitted, we compute comprehensive diagnostics to assess goodness-of-fit, identify systematic biases, and evaluate model adequacy.

1.7.2.1 Compute Diagnostic Metrics

Show/Hide Code
foce_inspect = inspect(foce_fit)

The resulting PumasInspect object contains:

  • PRED: Population predictions (using population parameters only)
  • IPRED: Individual predictions (using individual parameters with EBEs)
  • WRES/CWRES: (Conditional) weighted residuals
  • EBEs: Individual random effect estimates (η values)

1.7.2.2 Standard Goodness-of-Fit Panel

Pumas provides several pre-defined diagnostic plots covering the most common goodness-of-fit evaluations.

Show/Hide Code
goodness_of_fit(foce_inspect;
    observations=[:CONC],
    markercolor=(:grey, 0.5),
    legend=(
        position=:bottom,
        framevisible=false,
        labelsize=18,
        patchsize=(20, 10),
    ),
    figure=(size=(800, 800), fontsize=18))
Figure 14: Standard goodness-of-fit diagnostics for one-compartment model. Four-panel display includes: (1) observations vs population predictions assessing population-level bias, (2) observations vs individual predictions assessing individual-level fit, (3) conditional weighted residuals vs predictions checking for heteroscedasticity, and (4) conditional weighted residuals vs time after dose identifying time-dependent patterns. Systematic deviations indicate model misspecification.

This plot shows:

  • Observations vs. population predictions (OBS vs. PRED)
  • Observations vs. individual predictions (OBS vs. IPRED)
  • Conditional weighted residuals vs. predictions (CWRES vs. PRED)
  • Conditional weighted residuals vs. time after dose (CWRES vs. TAD)

From these plots, we can assess bias, heteroscedasticity, and systematic trends in residuals.

1.7.2.3 Focused Diagnostic Plots

For detailed examination of specific diagnostic aspects, individual plotting functions provide focused views.

Show/Hide Code
pk_diagnostics_kwargs = (observations=[:CONC],
    markercolor=(:grey, 0.5),
    legend=(
        position=:bottom,
        framevisible=false,
        labelsize=13,
        patchsize=(20, 10),))
Show/Hide Code
observations_vs_predictions(foce_inspect; pk_diagnostics_kwargs...)
Figure 15: Observations versus population predictions for one-compartment model. Identity line (dashed) represents perfect agreement. Systematic deviation from identity indicates population-level bias in model structure or parameter values.
Show/Hide Code
observations_vs_ipredictions(foce_inspect; pk_diagnostics_kwargs...)
Figure 16: Observations versus individual predictions incorporating empirical Bayes estimates. Closer agreement with identity line compared to population predictions indicates that random effects successfully capture between-subject variability.
Show/Hide Code
wresiduals_vs_time(foce_inspect; pk_diagnostics_kwargs...)
Figure 17: Conditional weighted residuals versus time after dose. Random scatter around zero indicates adequate model fit. Systematic trends (e.g., residuals consistently positive at early times) suggest structural misspecification such as incorrect absorption or distribution kinetics.

Each function produces focused plots for specific diagnostics. Additional diagnostic functions like wresiduals_vs_predictions() and observations_vs_ipredictions() are detailed in Appendix XXX[C]XXX.

1.7.2.4 Individual Subject Fits

Visualize model fit for each subject to identify outliers and assess individual-level performance.

Show/Hide Code
all_subject_fits = subject_fits(foce_inspect;  pk_diagnostics_kwargs..., paginate = true, separate = true,)
all_subject_fits[4]
Figure 18: Individual subject concentration-time profiles with model fits. Observed concentrations (points) overlaid with individual predictions (lines) derived from empirical Bayes estimates. Each panel represents one subject. Systematic mismatch in specific subjects may indicate outliers or covariate effects. Only first page of paginated output shown.

This generates individual concentration-time plots overlaying observed data with individual predictions. The paginate = true option splits subjects across multiple pages, and separate = true gives each subject its own panel.

1.7.2.5 Visual Predictive Check

VPC is a simulation-based diagnostic comparing observed data quantiles to model-predicted quantile intervals. Agreement indicates adequate model predictive performance.

Show/Hide Code
foce_vpc = vpc(foce_fit; covariates = [:tad])
[ Info: Continuous VPC
Visual Predictive Check
  Type of VPC: Continuous VPC
  Simulated populations: 499
  Subjects in data: 50
  Stratification variable(s): None
  Confidence level: 0.95
  VPC lines: quantiles ([0.1, 0.5, 0.9])

The covariates = [:tad] argument bins observations by time after dose. The vpc() function simulates many replicates of the dataset and computes prediction intervals for the 5th, 50th, and 95th percentiles.

Show/Hide Code
plt_vpc = vpc_plot(foce_vpc,
    observations=true,
    markercolor=:grey,
    observed_linewidth=4,
    figurelegend=(position=:b, 
                  alignmode=Outside(), 
                  orientation=:horizontal, 
                  nbanks=3),
    markersize=12,
    axis=(yscale=Makie.pseudolog10,
        xticks = 0:8:120,
        yticks = 0:20:120,
        ylabel="Concentration (ng/mL)",
        xlabel="Time (hours)",
        spinewidth=4,
        rightspinevisible=false,
        topspinevisible=false),
    figure=(size=(1800, 1400), fontsize=40))
plt_vpc
Figure 19: Visual predictive check for one-compartment model. Observed data 5th, 50th, and 95th percentiles (solid lines with points) are overlaid on model-predicted percentile intervals (shaded bands). Observations falling outside prediction intervals indicate model misspecification. Systematic deviations in distribution phase suggest need for multi-compartment structure.

The plot shows observed quantiles (lines/points) overlaid on simulated quantile prediction intervals (shaded regions). Deviations between observed and simulated quantiles indicate areas where the model may not adequately describe the data.

1.7.3 Two-Compartment Model Development

Diagnostic plots from the one-compartment model reveal systematic misfit in the distribution phase (early time points show under-prediction). This motivates exploration of a two-compartment structure.

1.7.3.1 Define Two-Compartment Model

The two-compartment model adds a peripheral compartment, requiring two additional parameters: inter-compartmental clearance (Q) and peripheral volume (Vp). The closed-form solution is Depots1Central1Periph1.

Show/Hide Code
oral_model_2cmt = @model begin
    @metadata begin
        desc = "Two-compartment oral absorption model"
        timeu = u"hr"
    end

    @param begin
        θka  RealDomain(lower = 0.01)
        θcl  RealDomain(lower = 0.01)
        θvc  RealDomain(lower = 0.01, upper = 190.0)
        θq  RealDomain(lower = 0.01, upper = 90.0)
        θvp  RealDomain(lower = 0.01, upper = 190.0)

        Ω_pk  PDiagDomain(5)

        σ_add_pk  RealDomain(lower = 0.0)
        σ_prop_pk  RealDomain(lower = 0.0)
    end

    @random begin
        η ~ MvNormal(Ω_pk)
    end

    @pre begin
       ηka, ηcl, ηvc, ηq, ηvp = η
        
        Ka = θka * exp(ηka)
        CL = θcl * exp(ηcl)
        Vc = θvc * exp(ηvc)
        Q = θq   * exp(ηq)
        Vp = θvp * exp(ηvp)
    end

    @dynamics Depots1Central1Periph1

    @derived begin
        ipred := @. Central / Vc
        CONC ~ @. Normal(ipred, sqrt(σ_add_pk^2 + (ipred * σ_prop_pk)^2))
    end
end
PumasModel
  Parameters: θka, θcl, θvc, θq, θvp, Ω_pk, σ_add_pk, σ_prop_pk
  Random effects: η
  Covariates:
  Dynamical system variables: Depot, Central, Peripheral
  Dynamical system type: Closed form
  Derived: CONC
  Observed: CONC

1.7.3.2 Fit Two-Compartment Model

We initialize parameters using one-compartment estimates where applicable and add reasonable starting values for the new Q and Vp parameters.

Show/Hide Code
initial_est_oral_model_2cmt = (
    coef(foce_fit)...,
    θq = 0.9,
    θvp = 0.9,
    Ω_pk = Diagonal([0.1, 0.1, 0.1, 0.1, 0.1]),
)
foce_loglikelihood = loglikelihood(
    oral_model_2cmt,
    population,
    initial_est_oral_model_2cmt,
    FOCE()
)
-2589.1976148451167

Fit using FOCE and existing parameters set to previous fitted values:

Show/Hide Code
foce_fit_2cmt = fit(
    oral_model_2cmt,
    population,
    initial_est_oral_model_2cmt, 
    FOCE();
    optim_options = (; show_trace = false),
    verbose=false,
)

1.7.3.3 Evaluate Two-Compartment Model

Compute diagnostics and compare to one-compartment model to assess improvement.

Show/Hide Code
foce_inspect_2cmt = inspect(foce_fit_2cmt)
Show/Hide Code
fig3_std2cmpt_gof = goodness_of_fit(foce_inspect_2cmt; 
                observations = [:CONC], 
                markercolor = (:grey, 0.5), 
                legend = (position = :bottom,
                        framevisible = false,
                        labelsize = 18,
                        patchsize = (20, 10),
                        ),
                figure = (size = (800, 800), fontsize = 18)
)
# save("./figures/fig3-std2cmpt-gof.png", fig3_std2cmpt_gof; px_per_unit = 2)
fig3_std2cmpt_gof
Figure 20: Goodness-of-fit diagnostics for two-compartment model. Comparison with one-compartment diagnostics (Figure 14) should show reduced systematic deviations, particularly at early time points where distribution phase is prominent. Improved alignment of observations with predictions indicates superior model structure.

The diagnostics should show improved fit compared to the one-compartment model, particularly in the distribution phase.

1.7.3.4 Compare Model Performance

Formal model comparison using parameter estimates and information criteria.

Show/Hide Code
compare_estimates(one_compartment = foce_fit, two_compartment = foce_fit_2cmt)
Table 3: Parameter estimate comparison between one-compartment and two-compartment models. Standard errors, relative standard errors, and confidence intervals provided for each parameter. Addition of peripheral compartment (Q, Vp) should improve model fit with acceptable increase in parameter uncertainty.
12×3 DataFrame
Row parameter one_compartment two_compartment
String Float64? Float64?
1 θka 5.78368 1.8091
2 θcl 1.38387 1.3014
3 θvc 82.8229 33.9426
4 Ω_pk₁,₁ 1.27888e-7 0.00122221
5 Ω_pk₂,₂ 0.0936679 0.17496
6 Ω_pk₃,₃ 0.0860281 0.202418
7 σ_add_pk 0.0153176 0.0150842
8 σ_prop_pk 0.285343 0.0980985
9 θq missing 11.8143
10 θvp missing 77.7894
11 Ω_pk₄,₄ missing 0.0290994
12 Ω_pk₅,₅ missing 0.09743

and the metrics_table() function can be used to compare model performance metrics such as AIC and BIC.

Show/Hide Code
@chain begin
    leftjoin(metrics_table(foce_fit),
         metrics_table(foce_fit_2cmt), on = :Metric, makeunique=true)
    rename(:Value => :One_Compartment,
           :Value_1 => :Two_Compartment)
end
Table 4: Model comparison metrics for one-compartment versus two-compartment structures. Lower AIC and BIC values indicate superior model with penalty for additional parameters. Substantial improvement in log-likelihood should outweigh parameter complexity penalty.
20×3 DataFrame
Row Metric One_Compartment Two_Compartment
String Any Any
1 Successful true true
2 Estimation Time 2.028 7.092
3 Subjects 50 50
4 Fixed Parameters 0 0
5 Optimized Parameters 8 12
6 CONC Active Observations 1350 1350
7 CONC Missing Observations 0 0
8 PD_CONC Active Observations 950 950
9 PD_CONC Missing Observations 400 400
10 Total Active Observations 2300 2300
11 Total Missing Observations 400 400
12 Likelihood Approximation Pumas.FOCE{Optim.NewtonTrustRegion{Float64}, Optim.Options{Float64, Nothing}} Pumas.FOCE{Optim.NewtonTrustRegion{Float64}, Optim.Options{Float64, Nothing}}
13 LogLikelihood (LL) -2589.05 -1351.96
14 -2LL 5178.11 2703.92
15 AIC 5194.11 2727.92
16 BIC 5240.03 2796.81
17 (η-shrinkage) η₁ 0.999 0.824
18 (η-shrinkage) η₂ 0.019 -0.0
19 (η-shrinkage) η₃ 0.03 0.001
20 (ϵ-shrinkage) CONC 0.031 0.066

1.7.3.5 VPC for Two-Compartment Model

Show/Hide Code
foce_vpc_2cmt = vpc(foce_fit_2cmt; 
               covariates = [:tad],
               ensemblealg = EnsembleThreads())
[ Info: Continuous VPC
Visual Predictive Check
  Type of VPC: Continuous VPC
  Simulated populations: 499
  Subjects in data: 50
  Stratification variable(s): None
  Confidence level: 0.95
  VPC lines: quantiles ([0.1, 0.5, 0.9])
Show/Hide Code
plt_vpc_2cmt = vpc_plot(foce_vpc_2cmt,
    observations=true,
    markercolor=:grey,
    observed_linewidth=4,
    figurelegend=(position=:b, 
                  alignmode=Outside(), 
                  orientation=:horizontal, 
                  nbanks=3),
    markersize=12,
    axis=(yscale=Makie.pseudolog10,
        xticks = 0:8:120,
        yticks = 0:20:120,
        ylabel="Concentration (ng/mL)",
        xlabel="Time (hours)",
        spinewidth=4,
        rightspinevisible=false,
        topspinevisible=false),
    figure=(size=(1800, 1400), fontsize=40))
 
# save("./figures/fig4-std2cmpt-vpc.png", plt_vpc_2cmt; px_per_unit = 2)   
plt_vpc_2cmt
Figure 21: Visual predictive check for two-compartment model. Observed percentiles should show improved alignment with prediction intervals compared to one-compartment VPC (Figure 19), particularly in the distribution phase (early time points). Adequate coverage of observed data by prediction bands indicates satisfactory model predictive performance.

1.7.4 Random Effect Structure Refinement

With satisfactory structural model identified (two-compartment), we evaluate random effect necessity. Diagnostics suggest minimal between-subject variability in Ka, motivating a simplified random effect structure.

Show/Hide Code
oral_model_2cmt_noka = @model begin
    @metadata begin
        desc = "Two-compartment oral model without Ka random effect"
        timeu = u"hr"
    end

    @param begin
        θka  RealDomain(lower = 0.01)
        θcl  RealDomain(lower = 0.01)
        θvc  RealDomain(lower = 0.01, upper = 190.0)
        θq  RealDomain(lower = 0.01, upper = 90.0)
        θvp  RealDomain(lower = 0.01, upper = 190.0)

        Ω_pk  PDiagDomain(4)  # Now 4 instead of 5

        σ_add_pk  RealDomain(lower = 0.0)
        σ_prop_pk  RealDomain(lower = 0.0)
    end

    @random begin
        η ~ MvNormal(Ω_pk)
    end

    @pre begin
        ηcl, ηvc, ηq, ηvp = η  # Only 4 random effects now
        Ka = θka  # No random effect
        CL = θcl * exp(ηcl)
        Vc = θvc * exp(ηvc)
        Q = θq * exp(ηq)
        Vp = θvp * exp(ηvp)
    end

    @dynamics Depots1Central1Periph1

    @derived begin
        ipred := @. Central / Vc
        CONC ~ @. Normal(ipred, sqrt(σ_add_pk^2 + (ipred * σ_prop_pk)^2))
    end
end
PumasModel
  Parameters: θka, θcl, θvc, θq, θvp, Ω_pk, σ_add_pk, σ_prop_pk
  Random effects: η
  Covariates:
  Dynamical system variables: Depot, Central, Peripheral
  Dynamical system type: Closed form
  Derived: CONC
  Observed: CONC

Fit the model:

Show/Hide Code
foce_noka = fit(
    oral_model_2cmt_noka,
    population,
    (coef(foce_fit_2cmt)..., Ω_pk = Diagonal([0.1, 0.1, 0.1, 0.1])),
    FOCE();
    optim_options = (; show_trace = false),
    verbose=false,
)

To compare parameter estimates between models, use compare_estimates():

Show/Hide Code
compare_estimates(with_ka = foce_fit_2cmt, without_ka = foce_noka)
12×3 DataFrame
Row parameter with_ka without_ka
String Float64? Float64?
1 θka 1.8091 1.80817
2 θcl 1.3014 1.30141
3 θvc 33.9426 33.9226
4 θq 11.8143 11.8179
5 θvp 77.7894 77.7969
6 Ω_pk₁,₁ 0.00122221 0.174929
7 Ω_pk₂,₂ 0.17496 0.201822
8 Ω_pk₃,₃ 0.202418 0.0297052
9 Ω_pk₄,₄ 0.0290994 0.0977277
10 σ_add_pk 0.0150842 0.0150817
11 σ_prop_pk 0.0980985 0.098142
12 Ω_pk₅,₅ 0.09743 missing

This produces a side-by-side table of parameter estimates, making it easy to see how removing the Ka random effect impacts other parameters.

For model selection, use information criteria:

Show/Hide Code
bic_ka, bic_noka = bic(foce_fit_2cmt), bic(foce_noka)
(2796.806076045644, 2789.0851311814536)
Show/Hide Code
ll_ka, ll_noka = loglikelihood(foce_fit_2cmt), loglikelihood(foce_noka)
(-1351.9590516113187, -1351.968911380182)

Lower BIC indicates better model fit penalized for complexity. If BIC improves (decreases) when removing the Ka random effect, the simpler model is preferred.

Model Selection Criteria (ⓐ)
  • AIC (Akaike Information Criterion): Penalizes additional parameters less aggressively than BIC
  • BIC (Bayesian Information Criterion): Penalizes additional parameters based on sample size
  • Log-Likelihood Ratio Test: For nested models, compute LRT = -2 × (LLreduced - LLfull), which follows a χ² distribution with degrees of freedom equal to the difference in number of parameters

1.7.5 Covariate Modeling

With structural and random effect models established, we investigate covariate relationships to explain residual between-subject variability and enable population extrapolation.

1.7.5.1 Identify Candidate Covariates

Show/Hide Code
foce_inspect_2cmt_noka = inspect(foce_noka)

1.7.5.2 Diagnosing Covariates Based on Empirical Bayes Estimates (✓)

The empirical_bayes_vs_covariates() function plots EBEs (individual random effects) against available covariates:

Show/Hide Code
empirical_bayes_vs_covariates(foce_inspect_2cmt_noka; 
                              covariates = [:WEIGHTB, :SEX, :DOSE],
                              categorical = [:SEX, :DOSE],
                              color = (:grey, 0.5))
Figure 22: Empirical Bayes estimates (individual random effects) plotted against available covariates (body weight, sex, dose level) to identify potential covariate relationships. Trends or correlations suggest covariates may explain between-subject variability currently captured by random effects. Weight shows positive correlation with Vc random effects; sex may influence CL random effects. These patterns guide covariate model development.

If EBEs show trends or correlations with covariates, it suggests the covariate may explain some of the unexplained variability currently captured by random effects. This guides covariate selection.

1.7.5.3 Covariate Implementation Strategies

Covariate models should be coded such that setting the covariate parameter to zero (or a neutral value) removes the covariate effect. This makes model comparison straightforward.

1.7.5.3.1 Categorical Covariates (✓)

For categorical covariates (e.g., sex), we use conditional logic to apply different parameter values to different categories. Consider an effect of sex on clearance:

Using if-elseif-else blocks:

@pre begin
    COVsexCL = if SEX == "Female"
        1 + θsexCLF  # Effect for females
    elseif SEX == "Male"
        1            # Reference (no effect)
    else
        error("Expected SEX to be either \"Female\" or \"Male\" but the value was: $SEX")
    end

    CL = θcl * exp(η[1]) * COVsexCL
    Vc = θvc * exp(η[2])
end

Using ternary expressions (more concise):

@pre begin
    COVsexCL =
        SEX == "Female" ? 1 + θsexCLF :
        (SEX == "Male" ? 1 :
         error("Expected SEX to be either \"Female\" or \"Male\" but the value was: $SEX"))

    CL = θcl * exp(η[1]) * COVsexCL
    Vc = θvc * exp(η[2])
end

In this parameterization, θsexCLF = 0 means no sex effect. A positive value increases clearance in females relative to males.

1.7.5.3.2 Continuous Covariates (✓)

For continuous covariates (e.g., weight), we typically use power models centered at a reference value:

@pre begin
    # Effect of body weight on Vc
    COVwtVc = (WEIGHTB / 70)^θvcWEIGHTB

    CL = θcl * exp(η[1])
    Vc = θvc * exp(η[2]) * COVwtVc
end

Here, θvcWEIGHTB = 0 removes the weight effect (since (WEIGHTB/70)^0 = 1). A value of 1 corresponds to allometric scaling (Vc ∝ weight).

1.7.5.4 Covariate Model Example

Based on EBE diagnostics showing relationships between sex and CL, and between weight and Vc, we build a covariate model:

Show/Hide Code
mdl_cov = @model begin
    @metadata begin
        desc = "Two-compartment model with covariates"
        timeu = u"hr"
    end

    @param begin
        θka  RealDomain(lower = 0.01)
        θcl  RealDomain(lower = 0.01)
        θvc  RealDomain(lower = 0.01, upper = 190.0)
        θq  RealDomain(lower = 0.01, upper = 90.0)
        θvp  RealDomain(lower = 0.01, upper = 190.0)

        θsexCLF  RealDomain(lower = -10.0, upper = 10.0)
        θvcWEIGHTB  RealDomain(lower = 0.1, upper = 10.0)

        Ω_pk  PDiagDomain(4)

        σ_add_pk  RealDomain(lower = 0.0)
        σ_prop_pk  RealDomain(lower = 0.0)
    end

    @covariates SEX WEIGHTB

    @random begin
        η ~ MvNormal(Ω_pk)
    end

    @pre begin
        ηcl, ηvc, ηq, ηvp = η
        Ka = θka

        COVsexCL =
            SEX == "Female" ? 1 + θsexCLF :
            (SEX == "Male" ? 1 :
             error("Expected SEX to be either \"Female\" or \"Male\" but the value was: $SEX"))

        CL = θcl * COVsexCL * exp(ηcl)
        Vc = θvc * (WEIGHTB/77)^θvcWEIGHTB * exp(ηvc)
        Q = θq * exp(ηq)
        Vp = θvp * exp(ηvp)
    end

    @dynamics Depots1Central1Periph1

    @derived begin
        ipred := @. Central / Vc
        CONC ~ @. Normal(ipred, sqrt(σ_add_pk^2 + (ipred * σ_prop_pk)^2))
    end
end
PumasModel
  Parameters: θka, θcl, θvc, θq, θvp, θsexCLF, θvcWEIGHTB, Ω_pk, σ_add_pk, σ_prop_pk
  Random effects: η
  Covariates: SEX, WEIGHTB
  Dynamical system variables: Depot, Central, Peripheral
  Dynamical system type: Closed form
  Derived: CONC
  Observed: CONC

Note the @covariates declaration specifying which covariates the model uses.

1.7.5.5 Fitting the Covariate Model

Validate initial parameters:

Show/Hide Code
initial_cov = (;
    coef(foce_noka)...,
    θsexCLF = 0.1,
    θvcWEIGHTB = 1.0,
)
foce_loglikelihood = loglikelihood(mdl_cov, population, initial_cov, FOCE())
-1358.9167442522842

We reuse parameters from the base model (foce_noka) and add neutral initial values for covariate effects. This approach minimizes disruption to the parameter space—covariate effects start small and are estimated from data.

Choosing Initial Covariate Values

When introducing covariates, set their initial values to “neutral” (zero for additive effects, one for multiplicative effects). This ensures that fixed effect parameters from the base model remain approximately valid. If you set a covariate parameter to a large initial value, you may need to adjust the corresponding fixed effect parameter to compensate.

Fit the model:

Show/Hide Code
oral_model_covar_results = fit(
    mdl_cov,
    population,
    initial_cov,
    Pumas.FOCE();
    optim_options = (show_trace = false,),
)

1.7.5.6 Evaluating the Covariate Model

Compare estimates:

Show/Hide Code
compare_estimates(;foce_noka, oral_model_covar_results)
Table 5: Parameter estimate comparison between base model (foce_noka) and covariate model (oral_model_covar_results). Addition of sex effect on CL and weight effect on Vc reduces random effect variances (Ω), indicating successful explanation of between-subject variability. Fixed effect estimates may shift as covariate effects absorb variability.
13×3 DataFrame
Row parameter foce_noka oral_model_covar_results
String Float64? Float64?
1 θka 1.80817 1.80948
2 θcl 1.30141 1.90963
3 θvc 33.9226 33.8341
4 θq 11.8179 11.8087
5 θvp 77.7969 77.8592
6 Ω_pk₁,₁ 0.174929 0.03793
7 Ω_pk₂,₂ 0.201822 0.202339
8 Ω_pk₃,₃ 0.0297052 0.029917
9 Ω_pk₄,₄ 0.0977277 0.0982499
10 σ_add_pk 0.0150817 0.0150797
11 σ_prop_pk 0.098142 0.09812
12 θsexCLF missing -0.522478
13 θvcWEIGHTB missing 0.1

We expect to see:

  • Fixed effect estimates may change slightly (e.g., θcl adjusts as sex effect is introduced)
  • Random effect variances (Ω) should decrease if covariates successfully explain variability
  • Covariate parameters (θsexCLF, θvcWEIGHTB) with confidence intervals not including neutral values indicate significant effects

Diagnostics:

Show/Hide Code
oral_model_covar_inspect = inspect(oral_model_covar_results)

Indeed, we see the expected change in parameter estimates

Show/Hide Code
coeftable(oral_model_covar_results)
Table 6: Parameter estimates with standard errors and 95% confidence intervals for covariate model. Covariate parameters (θsexCLF for sex effect on clearance, θvcWEIGHTB for weight effect on volume) with confidence intervals excluding neutral values indicate statistically significant covariate relationships.
13×3 DataFrame
Row parameter constant estimate
String Bool Float64
1 θka false 1.80948
2 θcl false 1.90963
3 θvc false 33.8341
4 θq false 11.8087
5 θvp false 77.8592
6 θsexCLF false -0.522478
7 θvcWEIGHTB false 0.1
8 Ω_pk₁,₁ false 0.03793
9 Ω_pk₂,₂ false 0.202339
10 Ω_pk₃,₃ false 0.029917
11 Ω_pk₄,₄ false 0.0982499
12 σ_add_pk false 0.0150797
13 σ_prop_pk false 0.09812

and in the plot of empirical bayes estimates against covariates

Show/Hide Code
empirical_bayes_vs_covariates(oral_model_covar_inspect; 
                              covariates = [:WEIGHTB, :SEX, :DOSE],
                              categorical = [:SEX, :DOSE],
                              color = (:grey, 0.5))
Figure 23: Empirical Bayes estimates versus covariates after covariate model implementation. Trends observed in base model (fig-ebe-vs-covariates-base) are substantially reduced, confirming that sex and weight covariates successfully explain previously unexplained between-subject variability. Residual random effects show minimal correlation with covariates.

After including covariates, the EBE vs. covariate plots should show reduced trends, indicating that covariates have successfully explained variability previously captured by random effects.

Automatic Covariate Selection (→)

Pumas supports automated covariate selection via covariate_select(). This function performs stepwise forward/backward selection. However, automatic covariate selection is computationally expensive and requires a stable base model. It should be used judiciously and results should be validated with clinical and physiological knowledge. Full details are in the documentation at docs.pumas.ai.

1.8 A.5 Parameter Uncertainty Quantification

Before finalizing our PK model, we quantify parameter uncertainty. This is essential for assessing statistical significance, evaluating model stability, and propagating uncertainty to simulations. This corresponds to Stages 9-10 in the workflow map.

1.8.0.1 Standard Errors via Sandwich Estimator

The infer() function computes asymptotic standard errors using the sandwich estimator:

Show/Hide Code
infer_covar = infer(oral_model_covar_results)
infer_table = coeftable(infer_covar)
[ Info: Calculating: variance-covariance matrix.
[ Info: Done.
Table 7: Parameter estimates with asymptotic standard errors, 95% confidence intervals, and relative standard errors (RSE%) computed using sandwich estimator. RSE% < 30% for fixed effects indicates good parameter precision. Confidence intervals that exclude neutral values (0 for additive, 1 for multiplicative effects) indicate statistically significant parameter estimates.
13×7 DataFrame
Row parameter constant estimate se relative_se ci_lower ci_upper
String Bool Float64 Float64 Float64 Float64 Float64
1 θka false 1.80948 0.0620353 0.0342835 1.68789 1.93107
2 θcl false 1.90963 0.0839004 0.0439354 1.74519 2.07407
3 θvc false 33.8341 2.52024 0.0744884 28.8945 38.7736
4 θq false 11.8087 0.435237 0.0368575 10.9556 12.6617
5 θvp false 77.8592 3.58216 0.0460082 70.8383 84.8801
6 θsexCLF false -0.522478 0.0272342 0.0521251 -0.575856 -0.4691
7 θvcWEIGHTB false 0.1 0.872923 8.72923 -1.6109 1.8109
8 Ω_pk₁,₁ false 0.03793 0.0100196 0.264162 0.0182918 0.0575681
9 Ω_pk₂,₂ false 0.202339 0.0365084 0.180432 0.130783 0.273894
10 Ω_pk₃,₃ false 0.029917 0.0111367 0.372252 0.00808953 0.0517445
11 Ω_pk₄,₄ false 0.0982499 0.0183905 0.187181 0.0622052 0.134295
12 σ_add_pk false 0.0150797 0.00248389 0.164718 0.0102113 0.019948
13 σ_prop_pk false 0.09812 0.00214825 0.0218941 0.0939095 0.10233

1.9 A.6 Pharmacodynamic Modeling

With a well-characterized PK model in hand, we now extend our analysis to pharmacodynamics (PD). Pharmacodynamic modeling links drug exposure (concentrations) to clinical or biological responses. In this section, we demonstrate multi-endpoint modeling, where both PK and PD observations are modeled simultaneously within a single framework. This approach corresponds to Stage 11 in the workflow map, specifically multi-endpoint modeling and sequential PKPD workflows.

We build an indirect response model where drug concentrations inhibit the degradation of a biomarker, with an effect compartment to capture hysteresis and a Hill coefficient for steep exposure-response. We demonstrate two approaches:

  1. Simultaneous PKPD modeling (✓): Estimate all PK and PD parameters jointly
  2. Sequential PKPD modeling (ⓐ): Fix PK parameters and estimate only PD parameters

We model inhibition of kout using an Emax model with Hill coefficient and effect compartment:

Ce’ = Ke0 × (Conc - Ce)

INH = Imax × Ceγ / (IC50γ + Ceγ)

Response’ = kin - kout × (1 - INH) × Response

Where:

  • kin: Zero-order production rate constant
  • kout: First-order degradation rate constant
  • Ce: Effect compartment concentration (introduces hysteresis)
  • Ke0: Effect compartment equilibration rate constant
  • Imax: Maximum inhibition effect (0-1)
  • IC50: Effect concentration at half-maximal inhibition
  • γ (gamma): Hill coefficient controlling curve steepness

At baseline (Conc = Ce = 0), Response’ = 0 and Response = kin/kout.

1.9.1 Simultaneous Multi-Endpoint PKPD Modeling

1.9.1.1 Model Definition

We extend the PK model from the previous section to include PD dynamics. The model includes:

  • PK component: Two-compartment oral model with covariates (from Section XXX[3]XXX)
  • PD component: Indirect response with Kout inhibition, effect compartment (Ce), and Hill coefficient (γ)
  • Multi-endpoint observations: Both CONC (PK) and PD_CONC (PD) in @derived

Note that we now use explicit ODEs in the @dynamics block rather than the name of a standard model. This is necessary because the PD dynamics (Response equation) must be coupled to the PK dynamics (Central compartment concentration).

Show/Hide Code
mdl_pkpd = @model begin
    @metadata begin
        desc = "Simultaneous PKPD model with indirect response (Kout inhibition, effect compartment, Hill coefficient)"
        timeu = u"hr"
    end

    @param begin
        # PK parameters (from covariate model)
        θka  RealDomain(lower=0.01)
        θcl  RealDomain(lower=0.01)
        θvc  RealDomain(lower=0.01, upper=190.0)
        θq  RealDomain(lower=0.01, upper=90.0)
        θvp  RealDomain(lower=0.01, upper=190.0)

        θsexCLF  RealDomain(lower=-10.0, upper=10.0)
        θvcWEIGHTB  RealDomain(lower=0.01, upper=190.0)

        Ω_pk  PDiagDomain(4)

        σ_add_pk  RealDomain(lower=0.0)
        σ_prop_pk  RealDomain(lower=0.0)

        # PD parameters - Kout inhibition with effect compartment and Hill coefficient
        tvkin  RealDomain(lower=0.001, upper=30)
        tvkout  RealDomain(lower=0.001, upper=100.0)
        tvic50  RealDomain(lower=0.01, upper=100)
        tvimax  RealDomain(lower=0.0, upper=1.0)
        tvke0  RealDomain(lower=0.001, upper=10.0)
        tvgamma  RealDomain(lower=0.1, upper=10.0)

        Ωpd  PDiagDomain(2)

        σ_add_pd  RealDomain(lower=0.0)
        σ_prop_pd  RealDomain(lower=0.0)
    end

    @covariates SEX WEIGHTB

    @random begin
        η ~ MvNormal(Ω_pk)
        ηpd ~ MvNormal(Ωpd)
    end

    @pre begin
        # PK parameters
        Ka = θka
        COVsexCL = if SEX == "Female"
            1 + θsexCLF
        elseif SEX == "Male"
            1
        else
            error("Expected SEX to be either \"Female\" or \"Male\" but the value was: $SEX")
        end

        CL = θcl * COVsexCL * exp(η[1])
        Vc = θvc * (WEIGHTB / 77)^θvcWEIGHTB * exp(η[2])
        Q = θq * exp(η[3])
        Vp = θvp * exp(η[4])

        # PD parameters - Kout inhibition with effect compartment
        kin = tvkin
        kout = tvkout
        ic50 = tvic50 * exp(ηpd[1])
        imax = tvimax
        Ke0 = tvke0 * exp(ηpd[2])
        gamma = tvgamma
    end

    @vars begin
        Conc = Central / Vc
        Ce_safe = max(Ce, 0.0)
        INH = imax * Ce_safe^gamma / (ic50^gamma + Ce_safe^gamma)
    end

    @init begin
        Response = kin / kout
        Ce = 0.0
    end
    @dynamics begin
        Depot' = -Ka * Depot
        Central' = Ka * Depot - (CL + Q) / Vc * Central + Q / Vp * Peripheral
        Peripheral' = Q / Vc * Central - Q / Vp * Peripheral
        # Effect compartment introduces delay for hysteresis
        Ce' = Ke0 * (Conc - Ce)
        # Kout inhibition with Hill coefficient for steep exposure-response
        Response' = kin - kout * (1 - INH) * Response
    end

    @derived begin
        conc_model := @. Central / Vc
        CONC ~ @. Normal(conc_model, sqrt(σ_add_pk^2 + (conc_model * σ_prop_pk)^2))
        PD_CONC ~ @. Normal(Response, sqrt(σ_add_pd^2 + (Response * σ_prop_pd)^2))
    end
end
PumasModel
  Parameters: θka, θcl, θvc, θq, θvp, θsexCLF, θvcWEIGHTB, Ω_pk, σ_add_pk, σ_prop_pk, tvkin, tvkout, tvic50, tvimax, tvke0, tvgamma, Ωpd, σ_add_pd, σ_prop_pd
  Random effects: η, ηpd
  Covariates: SEX, WEIGHTB
  Dynamical system variables: Depot, Central, Peripheral, Ce, Response
  Dynamical system type: Nonlinear ODE
  Derived: CONC, PD_CONC, Conc, Ce_safe, INH
  Observed: CONC, PD_CONC, Conc, Ce_safe, INH

1.9.1.2 Key Features of Multi-Endpoint Models

  1. Multiple @random distributions: We have separate random effect vectors for PK (η) and PD (ηpd). This allows PK and PD random effects to be independent.

  2. @vars block: Defines intermediate variables (here, Conc) that can be used in the @dynamics block. This improves code readability by avoiding repeated expressions.

  3. Explicit ODEs: The @dynamics block now contains explicit differential equations rather than the name of a standard model. The PD equation depends on Conc, which couples PK and PD dynamics.

  4. Multiple observations in @derived: Both CONC and PD_CONC are specified, each with its own error model. Pumas will match these to the corresponding observation columns in the data.

1.9.1.3 Initial Parameter Values

We reuse PK parameters from the final covariate model and add initial PD parameter values:

Show/Hide Code
init_θ_pkpd = (
    coef(oral_model_covar_results)...,
    # PD parameters - Kout inhibition with effect compartment and Hill coefficient
    tvkin=1.5,          # Production rate (baseline = kin/kout = 50)
    tvkout=0.03,        # Degradation rate constant
    tvic50=12.0,        # IC50 for inhibition (mid-range of concentrations)
    tvimax=0.9,         # Maximum inhibition (90%)
    tvke0=0.08,         # Effect compartment rate constant (creates hysteresis)
    tvgamma=1.5,        # Hill coefficient (moderate exposure-response steepness)
    Ωpd=Diagonal([0.04, 0.04]),  # Variability on IC50 and Ke0
    σ_add_pd=0.5,       # Additive error (scaled for higher response values)
    σ_prop_pd=0.05,     # Proportional error
)

The ... operator unpacks all parameters from the PK model fit, and we append PD-specific parameters.

1.9.1.4 Prior Predictive Check for PD

Before fitting, simulate to check whether PD initial parameters are reasonable:

Show/Hide Code
sim_pd = simobs(mdl_pkpd, population, init_θ_pkpd; rng)
sim_plot(sim_pd; observations=[:PD_CONC],
    markercolor=:grey,
    axis=(ylabel="PD Concentration", xlabel="Time (hours)"),
    figure=(size=(800, 600), fontsize=18),
    legend=(position=:bottom, framevisible=false, labelsize=14, nbanks=1),)
Figure 24: Prior predictive check for PD model showing simulated PD response trajectories using initial parameter estimates. Profiles should be broadly consistent with observed data ranges and patterns. Large deviations indicate need to adjust initial parameter values before model fitting to improve optimization convergence.

This plot shows simulated PD responses at initial parameter values. If the simulated profiles are wildly inconsistent with observed PD data, adjust initial values before fitting.

1.9.1.5 Fitting the Simultaneous PKPD Model

For the simultaneous approach, we estimate all parameters together. However, since we have high confidence in the PK parameter estimates from the previous section, we can fix the PK parameters and estimate only PD parameters. This is done using the constantcoef argument:

Show/Hide Code
pd_ft = fit(
    mdl_pkpd,
    population,
    init_θ_pkpd,
    FOCE();
    constantcoef=keys(coef(oral_model_covar_results)),
    optim_options=(; show_trace=false,),
)

The constantcoef argument takes a collection of parameter names to hold fixed during optimization. Here, keys(coef(oral_model_covar_results)) returns all PK parameter names, so only PD parameters (tvkin, tvkout, tvic50, tvimax, tvke0, tvgamma, Ωpd, σ_add_pd, σ_prop_pd) are estimated.

Simultaneous vs. Sequential Estimation

Simultaneous estimation (removing constantcoef) would re-estimate all PK and PD parameters jointly. This approach:

  • ✅ Accounts for correlations between PK and PD parameters
  • ✅ Provides joint uncertainty quantification
  • ❌ Is more computationally demanding
  • ❌ May be unstable if PK model is not well-identified

Sequential estimation (using constantcoef to fix PK parameters) is appropriate when:

  • PK data are rich and PK parameters are well-estimated
  • PD data are sparse or limited
  • Computational efficiency is important

For this tutorial, we use the sequential approach within a simultaneous model framework by fixing PK parameters. This is just one of many sequential approaches that can be used.

1.9.1.6 Diagnostics for PD

Compute diagnostics:

Show/Hide Code
pd_ins = inspect(pd_ft)

Goodness-of-fit for PD observations:

Show/Hide Code
goodness_of_fit(pd_ins; observations=[:PD_CONC],
    markercolor=(:grey, 0.5),
    legend=(
        position=:bottom,
        framevisible=false,
        labelsize=18,
        patchsize=(20, 10),
    ),
    figure=(size=(800, 800), fontsize=18))
Figure 25: Goodness-of-fit diagnostics for PD model showing four standard diagnostic plots: observed vs population predictions, observed vs individual predictions, conditional weighted residuals vs time, and conditional weighted residuals vs predictions. Random scatter around identity line (predictions) and zero line (residuals) indicates adequate model fit to PD data.

1.9.2 Sequential PKPD Modeling with Parameter Transfer

An alternative to the approach above is true sequential modeling, where:

  1. Fit the PK model to PK data alone (already done in previous sections)
  2. Extract individual PK parameters (empirical Bayes estimates or EBEs)
  3. Treat individual PK parameters as covariates in the PD model
  4. Fit the PD model using only PD observations

This approach completely separates PK and PD estimation and is useful when PK and PD data come from different studies or sampling schemes.

1.9.2.1 Step 1: Extract Individual PK Parameters

From the PK model fit, we can extract individual parameter estimates. One approach is to simulate from the PK model and use the simulated data to construct a new population:

Show/Hide Code
sim_df = DataFrame(pd_ins)

Alternatively, you could extract EBEs directly from the inspect() object and merge them with the dataset.

1.9.2.2 Step 2: Create PD-Only Population with PK Parameters as Covariates

We construct a new population for PD modeling where individual CL and Vc are covariates:

Show/Hide Code
pop_pd = read_pumas(
    sim_df;
    id=:id,
    time=:time,
    observations=[:PD_CONC],
    covariates=[:Ka, :CL, :Vc, :Q, :Vp, :DOSE,],
    # amt, evid, cmt as appropriate for dosing
)
Population
  Subjects: 50
  Covariates: Ka, CL, Vc, Q, Vp, DOSE
  Observations: PD_CONC

1.9.2.3 Step 3: Define PD Model Using PK Covariates

The PD model no longer estimates PK parameters. Instead, it uses individual CL and Vc as covariates:

Show/Hide Code
pk_seq_pd = @model begin
    @metadata begin
        desc = "Sequential PD model with PK parameter covariates (Kout inhibition, effect compartment, Hill coefficient)"
        timeu = u"hr"
    end

    @covariates CL Vc Q Vp Ka DOSE

    @param begin
        # PD parameters - Kout inhibition with effect compartment and Hill coefficient
        tvkin  RealDomain(lower=0.001, upper=30)
        tvkout  RealDomain(lower=0.001, upper=100.0)
        tvic50  RealDomain(lower=0.01, upper=100)
        tvimax  RealDomain(lower=0.0, upper=1.0)
        tvke0  RealDomain(lower=0.001, upper=10.0)
        tvgamma  RealDomain(lower=0.1, upper=10.0)

        Ωpd  PDiagDomain(2)

        σ_add_pd  RealDomain(lower=0.0)
        σ_prop_pd  RealDomain(lower=0.0)
    end

    @random begin
        ηpd ~ MvNormal(Ωpd)
    end

    @pre begin
        # PD parameters - Kout inhibition with effect compartment
        kin = tvkin
        kout = tvkout
        ic50 = tvic50 * exp(ηpd[1])
        imax = tvimax
        Ke0 = tvke0 * exp(ηpd[2])
        gamma = tvgamma
    end

    @vars begin
        Conc = Central / Vc
        Ce_safe = max(Ce, 0.0)
        INH = imax * Ce_safe^gamma / (ic50^gamma + Ce_safe^gamma)
    end
    @init begin
        Response = kin / kout
        Ce = 0.0
    end
    @dynamics begin
        Depot' = -Ka * Depot
        Central' = Ka * Depot - (CL + Q) / Vc * Central + Q / Vp * Peripheral
        Peripheral' = Q / Vc * Central - Q / Vp * Peripheral
        # Effect compartment introduces delay for hysteresis
        Ce' = Ke0 * (Conc - Ce)
        # Kout inhibition with Hill coefficient for steep exposure-response
        Response' = kin - kout * (1 - INH) * Response
    end

    @derived begin
        PD_CONC ~ @. Normal(Response, sqrt(σ_add_pd^2 + (Response * σ_prop_pd)^2))
    end
end

init_θ_pkpd_seq = (
    # PD parameters - Kout inhibition with effect compartment and Hill coefficient
    tvkin=1.5,          # Production rate (baseline = kin/kout = 50)
    tvkout=0.03,        # Degradation rate constant
    tvic50=12.0,        # IC50 for inhibition (mid-range of concentrations)
    tvimax=0.9,         # Maximum inhibition (90%)
    tvke0=0.08,         # Effect compartment rate constant (creates hysteresis)
    tvgamma=1.5,        # Hill coefficient (moderate exposure-response steepness)
    Ωpd=Diagonal([0.04, 0.04]),  # Variability on IC50 and Ke0
    σ_add_pd=0.5,       # Additive error (scaled for higher response values)
    σ_prop_pd=0.05,     # Proportional error
)
Warning: Covariate DOSE is not used in the model.
@ Pumas none:3167
(tvkin = 1.5,
 tvkout = 0.03,
 tvic50 = 12.0,
 tvimax = 0.9,
 tvke0 = 0.08,
 tvgamma = 1.5,
 Ωpd = [0.04 0.0; 0.0 0.04],
 σ_add_pd = 0.5,
 σ_prop_pd = 0.05,)

This model uses inhibition of Kout with an effect compartment and Hill coefficient, matching the simultaneous model structure.

1.9.2.4 Step 4: Fit PD Model

Show/Hide Code
ft_seq = fit(pk_seq_pd, pop_pd, init_θ_pkpd_seq, FOCE();
    optim_options=(; show_trace=false), verbose=false)

This fit estimates only PD parameters, treating individual PK parameters as known.

The estimates should generally be the same, but some numerical differences is to be expected.

Show/Hide Code
compare_estimates(; pd_ft, ft_seq)
Table 8: Comparison of PD parameter estimates between simultaneous (pd_ft) and sequential (ft_seq) modeling approaches. Estimates should be similar but not identical due to different uncertainty propagation. Sequential approach treats PK parameters as fixed, while simultaneous approach allows PKPD parameter interactions during estimation.
23×3 DataFrame
Row parameter pd_ft ft_seq
String Float64? Float64?
1 tvkin 1.50157 1.49754
2 tvkout 0.0300331 0.0299498
3 tvic50 12.2191 12.1899
4 tvimax 0.9143 0.914112
5 tvke0 0.0794148 0.0797224
6 tvgamma 1.4131 1.41748
7 Ωpd₁,₁ 0.0465215 0.0461238
8 Ωpd₂,₂ 0.00970952 8.5971e-8
9 σ_add_pd 0.00594804 0.00124218
10 σ_prop_pd 0.0509662 0.05096
11 θka 1.80948 missing
12 θcl 1.90963 missing
13 θvc 33.8341 missing
14 θq 11.8087 missing
15 θvp 77.8592 missing
16 θsexCLF -0.522478 missing
17 θvcWEIGHTB 0.1 missing
18 Ω_pk₁,₁ 0.03793 missing
19 Ω_pk₂,₂ 0.202339 missing
20 Ω_pk₃,₃ 0.029917 missing
21 Ω_pk₄,₄ 0.0982499 missing
22 σ_add_pk 0.0150797 missing
23 σ_prop_pk 0.09812 missing

1.9.2.5 Advantages and Disadvantages of Sequential Modeling

Advantages:

  • ✅ Computationally efficient (smaller models, faster fits)
  • ✅ Allows separate teams to work on PK and PD
  • ✅ Useful when PK and PD data are from different sources
  • ✅ Simplifies model development (debug PK and PD separately)

Disadvantages:

  • ❌ Does not propagate PK parameter uncertainty to PD estimates
  • ❌ Cannot estimate PKPD parameter correlations
  • ❌ May underestimate overall uncertainty in simulations
  • ❌ Requires extra data manipulation to transfer parameters

For most analyses with rich PK and PD data from the same study, the simultaneous approach (with or without constantcoef) is preferred. Sequential modeling is most valuable when data sources differ or when computational constraints are severe.

When to Use Sequential Modeling

Consider sequential PKPD when:

  • PK and PD data come from different studies or phases
  • PD sampling is very sparse relative to PK
  • You need to rapidly prototype PD models without re-fitting PK
  • Computational resources are limited

For comprehensive final analyses, simultaneous estimation is generally recommended to properly account for uncertainty.

1.9.3 Summary: Pharmacodynamic Modeling

In this section, we extended our PK model to include pharmacodynamic endpoints, demonstrating multi-endpoint modeling approaches:

Simultaneous PKPD Modeling (✓)

  • Built an indirect response model with Kout inhibition, effect compartment, and Hill coefficient
  • Used explicit ODEs in @dynamics to couple PK and PD
  • Specified multiple observations in @derived (CONC and PD_CONC)
  • Fixed PK parameters using constantcoef to estimate only PD parameters
  • Applied standard diagnostic workflows to PD observations

Sequential PKPD Modeling

  • Outlined an alternative approach using individual PK parameters as covariates
  • Demonstrated complete separation of PK and PD estimation
  • Discussed trade-offs between simultaneous and sequential approaches

With a validated PKPD model, we’re now ready to proceed to simulation for decision support in the next section, where we’ll evaluate alternative dosing regimens and assess target attainment metrics to support dosing decisions.

Next Steps: Simulation for Decision Support

In the next section, we will:

  1. Create alternative dosing regimens using DosageRegimen()
  2. Simulate concentration and response profiles with simobs()
  3. Calculate target attainment metrics from simulated data
  4. Compare scenarios to identify optimal dosing strategies

This represents the culmination of the pharmacometric workflow—translating model-based understanding into actionable clinical recommendations.


1.10 A.7 Simulation for Decision Support

1.10.1 Clinical Question and Simulation Strategy

Pharmacometrics is inherently different from clinical statistics or biostatistics and one big difference is the focus on model building in pharmacometrics. Models are useful because they allow us to simulate scenarios to answer counter factual questions. This can be used to design clinical trials, apply for clinical trial waivers and more.

We will show how simulations can be implemented in Pumas to evaluate six alternative dosage regimens: 0 (placebo), 100, 200, 400, 800, or 1600 mg PO daily for 14 days. We are interested in three metrics.

The probability of obtaining a PD_CONC within the therapeutic range (80-150 units) at any time during treatment. The time needed to reach the first therapeutic PD_CONC value. The total time spent in the TR as a percentage of the dosing interval (i.e., 24 hours) at SS (Day 14).

The estimated population half-life for warfarin per our model is ~41 hours which means it should take roughly 9 days (on average) to achieve steady-state; we extend this to 14 days to ensure each of our virtual subjects is at SS prior to evaluation. We will generate a Population of 100 Subjects and use it simulate a total of 600 trials (200 per dosage regimen).

We will not include RUV, since most variability comes from BSV and RUV can make results difficult to interpret.

Show/Hide Code
tr_low, tr_high = 75, 125
(75, 125)
Layer 
  transformation: identity
  data: Nothing
  positional:
  named:
    color: :DOSE => (AlgebraOfGraphics.Renamer{Vector{Int64}, Vector{String}}([0, 100, 200, 400, 800, 1600], ["0 mg", "100 mg", "200 mg", "400 mg", "800 mg", "1600 mg"]) => "")
    col: :DOSE => AlgebraOfGraphics.Renamer{Vector{Int64}, Vector{String}}([0, 100, 200, 400, 800, 1600], ["0 mg", "100 mg", "200 mg", "400 mg", "800 mg", "1600 mg"])
Show/Hide Code
# combine layers and draw
fig6_pta = (data(_tbl) * (pi_layer + median_layer) * cf_map) + tr_layer |> draw(
    scales(;
        Y=(; label="Biomarker level"),
        X=(; label="Time after previous dose, hours"),
        secondary=(; palette=[:gray30]),
    );
    figure=(;
        size=(6, 4) .* 96,
        title="Predicted Biomarker Profiles at Dose Number 6 ",
        subtitle="Median (line), 90%PI (band), TR (dash-line)",
        titlealign=:left,
    ),
    axis=(;
        limits=(0, nothing, 0, nothing),
        xticks=0:6:24,
        xlabelpadding=10,
        #        yticks = 0:20:80,
    ),
    legend=(; orientation=:horizontal, framevisible=false, position=:bottom, nbanks=2, labelsize=10,),
)
# save("./figures/fig6-pta.png", fig6_pta, px_per_unit=2)
fig6_pta
Figure 26: Simulated steady-state PD biomarker profiles (Day 14, dose number 6) for six alternative dosing regimens (0 to 1600 mg). Each panel shows median (line) and 90% prediction interval (shaded band) across 100 simulated trials. Dashed horizontal lines indicate therapeutic range (80-150 units). Higher doses achieve greater biomarker response and longer time in therapeutic range, but may exceed upper target threshold.
600×5 DataFrame
575 rows omitted
Row DOSE rep_id ta tta ttr
Int64? Int64 Float64 Float64? Float64?
1 0 1 0.0 missing missing
2 100 1 0.0 missing missing
3 200 1 0.0 missing missing
4 400 1 50.0 120.1 99.1667
5 800 1 90.0 120.1 99.1667
6 1600 1 80.0 120.1 57.7083
7 0 2 0.0 missing missing
8 100 2 0.0 missing missing
9 200 2 0.0 missing missing
10 400 2 90.0 124.933 79.0278
11 800 2 100.0 120.1 94.2083
12 1600 2 60.0 120.1 90.9028
13 0 3 0.0 missing missing
589 0 99 0.0 missing missing
590 100 99 0.0 missing missing
591 200 99 0.0 missing missing
592 400 99 60.0 122.083 90.9028
593 800 99 100.0 120.1 99.1667
594 1600 99 70.0 120.1 66.0119
595 0 100 0.0 missing missing
596 100 100 0.0 missing missing
597 200 100 0.0 missing missing
598 400 100 50.0 124.46 81.0
599 800 100 100.0 122.48 89.25
600 1600 100 80.0 120.1 47.3958

1.10.2 Target Attainment Analysis

We evaluate three key efficacy metrics across dose levels:

  1. Probability of Target Attainment (TA): Proportion of subjects achieving PD response within therapeutic range (80-150 units) at any time during treatment
  2. Time to Target (TTA): Mean time (hours) to first observation within therapeutic range
  3. Time in Therapeutic Range (TTR): Percentage of 24-hour dosing interval at steady-state spent within therapeutic range
Show/Hide Code
table_one(
    dfproccessed,
    [
        :ta => war_metrics => "Probability of TA",
        :tta => war_metrics => "Time to Target",
        :ttr => war_metrics => "Time in TR",
    ],
    sort=false,
    groupby=:DOSE => "Dose, mg",
    show_total=false,
)
Table 9: Target attainment metrics by dose level summarized across 100 simulated trials. Median and 95% confidence intervals shown for: probability of target attainment (TA, percent of subjects reaching therapeutic range), time to target (TTA, hours to first therapeutic observation), and time in therapeutic range at steady-state (TTR, percent of dosing interval). Higher doses increase TA and TTR but may risk exceeding upper therapeutic threshold. Optimal dose balances efficacy and safety considerations.
Dose, mg
0 100 200 400 800 1600
Probability of TA
Median 0 0 0 60 100 70
95% CI [0, 0] [0, 0] [0, 10] [30, 80] [80, 100] [50, 90]
Time to Target
Median - - 133 124 120 120
95% CI - - [120, 143] [120, 130] [120, 123] [120, 120]
Time in TR
Median - - 45.4 83.4 94.2 69.9
95% CI - - [1.84, 99.2] [56.8, 99.2] [80.2, 99.2] [48.7, 92.9]
Interpreting Simulation Results for Dosing Decisions

These simulation results support dose selection by:

  • Quantifying trade-offs between efficacy (target attainment) and safety (exceeding upper threshold)
  • Identifying optimal doses for achieving therapeutic targets in majority of patients
  • Providing evidence for regulatory submissions and protocol design
  • Enabling scenario comparisons without additional clinical trials

The final dose selection should integrate these model-based predictions with clinical judgment, safety data, and practical considerations (e.g., formulation constraints, dosing convenience).


1.11 A.8 Installation and Setup

Pumas is available through multiple deployment options to suit different organizational needs and computational requirements.

1.11.1 Pumas Desktop

For individual users and small teams, Pumas Desktop provides a complete local installation with VS Code integration. This option is ideal for:

  • Individual pharmacometricians and researchers
  • Educational and training purposes
  • Development and prototyping workflows

Installation instructions: https://docs.pumas.ai/stable/platformdetails/desktop/

1.11.2 Pumas Cluster

For teams requiring shared computational resources and collaborative workflows, Pumas Cluster provides a managed high-performance computing environment. This option supports:

  • Team-based collaboration with shared projects
  • High-performance computing for large-scale simulations
  • Centralized package and environment management

Setup and configuration: https://docs.pumas.ai/stable/platformdetails/pumascluster/

1.11.3 Enterprise Cloud Platform

For organizations requiring enterprise-grade security, scalability, and support, PumasAI offers a fully managed cloud platform with:

  • Enterprise security and compliance (SOC 2, HIPAA-ready)
  • Dedicated support and training
  • Custom integrations and workflows
  • Scalable computational resources

For enterprise inquiries: Contact sales@pumas.ai


1.12 A.9 Additional Resources

The Pumas ecosystem provides comprehensive resources for learning, troubleshooting, and staying current with platform developments.

1.12.1 Documentation

1.12.2 Tutorials and Learning

1.12.3 Community and Support

1.12.4 Reproducible Research